Employee Turnover Analysis

1 Introduction

1.1 Business Problem

Over the past year, Fictional Solutions Inc. has been grappling with a troubling rise in employee turnover. These departures have disrupted team dynamics, increased recruitment and training costs, and jeopardized critical project deadlines. While exit surveys highlight dissatisfaction with workload, career growth, and recognition, the true drivers of attrition remain unclear, leaving managers uncertain about how to intervene effectively.

This project tackles two key challenges:

  1. Understanding Turnover Drivers: Analyzing factors like satisfaction, performance, and workload to uncover the root causes of employee departures.

  2. Predicting Turnover Risk: Developing a model to identify employees most likely to leave, enabling proactive, targeted retention strategies.

By uncovering the reasons behind attrition and identifying at-risk employees, Fictional Solutions Inc. can implement data-driven strategies to improve retention and rebuild workforce stability.

Note: This is a fictional scenario I created based on a dataset obtained from Kaggle. While hypothetical, it demonstrates how data-driven approaches can help organizations like Fictional Solutions Inc. tackle retention challenges effectively.

1.2 Project Objectives

In this project , I will:

  • Prepare: Clean and preprocess the dataset to look for (and address, if they exist) missing values, outliers, and inconsistencies, ensuring data is ready for analysis.

  • Explore: Conduct exploratory data analysis (EDA) to uncover distributions, trends, and relationships among key variables related to employee turnover.

  • Model: Apply statistical and machine learning techniques to identify factors associated with employee turnover and predict individuals at high risk of leaving.

  • Interpret: Generate actionable insights and provide data-driven recommendations based on the findnigs.

  • Visualize: Develop an interactive R Shiny dashboard to present findings in a clear, engaging format for stakeholders.

1.2.1 Skills Highlighted

  • Data Preparation: Cleaning, transforming, and pre-processing raw data for analysis.

  • Statistical Modeling: Logistic regression for understanding relationships and predicting outcomes.

  • Machine Learning: Random forests and neural networks for advanced prediction.

  • Data Visualization: Creating clear, insightful visuals to communicate results effectively.

  • Tools: Proficiency in R and R Shiny for analysis and dashboard development.

1.3 Dataset Overview

The data comprises 14,999 employee records with the following variables:

Variable Type Description
left Binary (Outcome) Indicates whether an employee left (1 = yes, 0 = no).
satisfaction_level Continuous Level of job satisfaction, ranging from 0 to 1.
last_evaluation Continuous Most recent performance evaluation score, ranging from 0 to 1.
number_projects Discrete Count of projects handled.
average_monthly_hours Continuous Average number of hours worked monthly.
time_spent_company Discrete Years spent in the company.
work_accident Binary Whether the employee had a work accident (1 = yes, 0 = no).
promotion_last_5years Binary Whether the employee was promoted in the last 5 years (1 = yes, 0 = no).
department Categorical Employee’s department.
salary Categorical Salary level (e.g., low, medium, high).

1.4 Modeling Approach

This project takes a structured approach to modeling, starting with simpler methods and progressing to more complex ones to understand and predict employee turnover risk. The goal is to balance interpretability and predictive accuracy while considering the strengths and limitations of each method.

Outcome Variable and Predictors

The target variable is left (binary: 1 = employee left, 0 = employee stayed). The predictors include:

  • Continuous Variables: satisfaction_level, last_evaluation, average_monthly_hours

  • Discrete Variables: number_projects, time_spent_company

  • Binary Variables: work_accident, promotion_last_5years

  • Categorical Variables: department, salary_level

Steps in the Modeling Process

  1. Logistic Regression:
    I start with logistic regression as a baseline model. Logistic regression is well-suited for binary outcomes, offering clear insights into the relationships between predictors and the probability of turnover. This step focuses on understanding the relative importance of each predictor and establishing a baseline for predictive performance.

  2. Multilayer Perceptron (MLP):
    Next, Itrain a Multilayer Perceptron, an artificial neural network model. MLPs can capture complex, non-linear interactions between predictors and the outcome, potentially uncovering patterns that logistic regression cannot. However, MLPs require careful tuning to prevent overfitting, particularly with smaller datasets.

  3. Random Forest:
    Finally, Iuse a Random Forest model. This ensemble learning approach builds multiple decision trees and aggregates their outputs, offering robust predictions and reducing overfitting risk. Random Forests are particularly useful when dealing with categorical and continuous predictors, and they provide insights into variable importance.

  4. Evaluating Model Performance:
    I compare the performance of the three models by examining their accuracy on a test set. By evaluating all three, I determine which approach offers better predictive accuracy in this context. I also explore pros/cons of each method, and how their suitability might depend on what we’re trying to accomplish (e.g., better understanding predictors of turnover vs. predicting future turnover).

Pros

Cons

Logistic Regression

  • Simple, highly interpretable.

  • Coefficients directly explain the effect of features.

  • Provides probabilistic outputs for class membership.

  • Computationally efficient and doesnt require as much data.

  • Assumes a linear relationship between predictors and outcome.

  • Sensitive to multicollinearity among predictors.

  • Requires pre-processing (e.g., predictor scaling).

  • Limited ability to model complex interactions/patterns.

  • Requires imputation for missing values.

  • More suited to structured/tabular data.

Multilayer Perceptron

  • Can capture non-linear, complex relationships.

  • Highly customizable (layers, activation functions, etc.).

  • Can handle unstructured daata (e.g., images, audio, text)

  • Functions as a black box, making interpretability difficult.

  • Computationally intensive.

  • Requires a lot of data (can overfit with insufficient data).

  • Requires pre-processing (e.g., feature scaling, one-hot encoding for categorical variables).

  • Requires imputation for missing values.

Random Forest

  • Can capture non-linear relationships.

  • Reduces overfitting (because of ensemble averaging).

  • Works with mixed data types (e.g., numerical, categorical) with minimal pre-processing.

  • Provides feature importance metrics for insights into predictors.

  • Doesnt require imputation for missing values.

  • Computationally intensive, especially with large datasets or many trees.

  • May favor features with high cardinality (e.g., categorical variables with many unique values).

  • More suited to structured/tabular data.

Show the code
#-----------------------------------------------------------------------------
# Initial Setup & Packages ---------------------------------------------------
#-----------------------------------------------------------------------------

# Clear Environment
rm(list=ls())

options(digits = 4)
options(max.print = 2000)

#loading packages
suppressMessages(
suppressWarnings(
  pacman::p_load(
    readxl,
    tidyverse, 
    skimr,
    ggplot2,
    plotly,
    highcharter,
    caret, # to create confusion matrices
    randomForest,
    corrplot,
    Hmisc,
    tidymodels,
    finalfit,
    performance,
    pscl, car,
    datatable,
    lme4
  )))

# Turning off scientific notation
options(scipen = 999)

#-----------------------------------------------------------------------------
# Importing Data -------------------------------------------------------------
#-----------------------------------------------------------------------------

employee_retention_data <- read_excel("C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/employee_turnover/data/hr_retention_data.xlsx")

2 Data Preparation

2.1 Checking for Missing Values

First, we check to see if there is any missing data in our dataset.

Show the code
#-----------------------------------------------------------------------------
# Check for Missing Values ---------------------------------------------------
#-----------------------------------------------------------------------------

sapply(employee_retention_data, function(x) sum(is.na(x)))
   satisfaction_level       last_evaluation        number_project 
                    0                     0                     0 
 average_montly_hours    time_spend_company         Work_accident 
                    0                     0                     0 
                 left promotion_last_5years                 sales 
                    0                     0                     0 
               salary 
                    0 

No missing values were detected in this dataset.

If missing values had been present, I would have approached the issue systematically by:

  1. Understanding the Missing Data: How many values are missing from each column, whether the data is missing at random or not (i.e., whether missingness is related to specific variables).

  2. If Data is Missing at Random: I would use multiple imputation, which works by analyzing how the variable with missing data is related to other variables in the dataset. It creates a model (or equation) to predict what the missing value might be based on those relationships. This process is repeated multiple times to account for uncertainty. After analyzing each dataset separately, the results would be combined to provide final conclusions of what the missing data might be.

  3. If Data is Not Missing at Random: If this is the case, the missingness is related to the missing values themselves. I wouldn’t use multiple imputation, because it assumes the missingness is random. Some options I would consider in that case include:

    • Testing how the results change when I make different assumptions about the missing data (e.g., if satisfaction scores are missing, testing scenario 1 where the missing scores are above average vs. scenario 2 where the missing scores are below average). I would compare the extent to which the coefficients change or are consistent across these scnarios.

    • Considering external data such as historical datasets from Fictional Solutions Inc (if they are available). For instance, if performance scores correlated highly with satisfaction scores in past datasets, perhaps this pattern could be used to predict missing satisfaction scores in the current data.

2.2 Checking Data Types

Next, we verify that the variables are stored as the correct data type. We inspect our dataset to see if this is the case.

Show the code
#-----------------------------------------------------------------------------
# Check Variables and Data Types ---------------------------------------------
#-----------------------------------------------------------------------------

# Check data types
 str(employee_retention_data)
tibble [14,999 × 10] (S3: tbl_df/tbl/data.frame)
 $ satisfaction_level   : num [1:14999] 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
 $ last_evaluation      : num [1:14999] 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
 $ number_project       : num [1:14999] 2 5 7 5 2 2 6 5 5 2 ...
 $ average_montly_hours : num [1:14999] 157 262 272 223 159 153 247 259 224 142 ...
 $ time_spend_company   : num [1:14999] 3 6 4 5 3 3 4 5 5 3 ...
 $ Work_accident        : num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
 $ left                 : num [1:14999] 1 1 1 1 1 1 1 1 1 1 ...
 $ promotion_last_5years: num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
 $ sales                : chr [1:14999] "sales" "sales" "sales" "sales" ...
 $ salary               : chr [1:14999] "low" "medium" "medium" "low" ...

We can see that some categorical variables (i.e., left, sales, salary) are stored as character variables, when they should be stored as factors. We convert that in the next step. We then inspect the dataset again to confirm that the change has worked.

Show the code
# Convert categorical variables (left, sales, salary) to factors
employee_retention_data <- employee_retention_data %>%
  mutate(left = as.factor(left),
         salary = as.factor(salary),
         sales = as.factor(sales))

# Check data types again
str(employee_retention_data)
tibble [14,999 × 10] (S3: tbl_df/tbl/data.frame)
 $ satisfaction_level   : num [1:14999] 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
 $ last_evaluation      : num [1:14999] 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
 $ number_project       : num [1:14999] 2 5 7 5 2 2 6 5 5 2 ...
 $ average_montly_hours : num [1:14999] 157 262 272 223 159 153 247 259 224 142 ...
 $ time_spend_company   : num [1:14999] 3 6 4 5 3 3 4 5 5 3 ...
 $ Work_accident        : num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
 $ left                 : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ promotion_last_5years: num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
 $ sales                : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
 $ salary               : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...

We can also see that there is a typo in some variable names. Specifically, average_montly_hours and Work_accident. We correct those in the code below, and verify that the change was applied.

Show the code
#-----------------------------------------------------------------------------
# Correct Typo in Variable Name ----------------------------------------------
#-----------------------------------------------------------------------------

employee_retention_data <- employee_retention_data %>%
  rename(average_monthly_hours = average_montly_hours,
         work_accident = Work_accident,
         department=sales)

# Verify that the name correction worked. 
str(employee_retention_data)
tibble [14,999 × 10] (S3: tbl_df/tbl/data.frame)
 $ satisfaction_level   : num [1:14999] 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
 $ last_evaluation      : num [1:14999] 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
 $ number_project       : num [1:14999] 2 5 7 5 2 2 6 5 5 2 ...
 $ average_monthly_hours: num [1:14999] 157 262 272 223 159 153 247 259 224 142 ...
 $ time_spend_company   : num [1:14999] 3 6 4 5 3 3 4 5 5 3 ...
 $ work_accident        : num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
 $ left                 : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ promotion_last_5years: num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
 $ department           : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
 $ salary               : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...

3 Exploratory Data Analysis (EDA)

3.1 Basic Descriptive Statistics

For numeric variables, we calculate some basic descriptive statistics (i.e., mean, standard deviation, minimum, 1st quartile, median, mean, 3rd quartile, and maximum).

Show the code
# Customize skimr to include min and max
my_skim <- skim_with(
  numeric = sfl(
    iqr = IQR,
    min = min,
    max = max,
    mean = mean,
    median = median,
    sd = sd,
    q1 = ~ quantile(.x, probs = .25),
    q3 = ~ quantile(., probs = .75)
  ),
  append = TRUE
)

eda_summary <- my_skim(employee_retention_data)

# Process and round numeric_summary
numeric_summary <- eda_summary %>%
  filter(skim_type == "numeric") %>%
  select(skim_variable, n_missing, numeric.mean, numeric.sd, numeric.min, numeric.max, numeric.q1, numeric.median, numeric.q3, numeric.iqr, numeric.hist) %>%
  rename(
    variable = skim_variable,
    mean = numeric.mean,
    sd = numeric.sd,
    min = numeric.min,
    max = numeric.max,
    q1 = numeric.q1,
    median = numeric.median,
    q3 = numeric.q3,
    iqr = numeric.iqr,
    hist=numeric.hist
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x, 2)))  # Round all numeric columns to 2 decimal places

numeric_summary
# A tibble: 7 × 11
  variable   n_missing   mean    sd   min   max     q1 median     q3   iqr hist 
* <chr>          <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <chr>
1 satisfact…         0   0.61  0.25  0.09     1   0.44   0.64   0.82  0.38 ▃▅▇▇▇
2 last_eval…         0   0.72  0.17  0.36     1   0.56   0.72   0.87  0.31 ▂▇▆▇▇
3 number_pr…         0   3.8   1.23  2        7   3      4      5     2    ▇▆▃▂▁
4 average_m…         0 201.   49.9  96      310 156    200    245    89    ▃▇▆▇▂
5 time_spen…         0   3.5   1.46  2       10   3      3      4     1    ▇▃▁▁▁
6 work_acci…         0   0.14  0.35  0        1   0      0      0     0    ▇▁▁▁▂
7 promotion…         0   0.02  0.14  0        1   0      0      0     0    ▇▁▁▁▁

For categorical/factor variables, we find the count for each level.

Show the code
# Get counts for each level in each factor variable

xtabs(~ left, data=employee_retention_data)
left
    0     1 
11428  3571 
Show the code
xtabs(~ salary, data=employee_retention_data)
salary
  high    low medium 
  1237   7316   6446 
Show the code
xtabs(~ department, data=employee_retention_data)
department
 accounting          hr          IT  management   marketing product_mng 
        767         739        1227         630         858         902 
      RandD       sales     support   technical 
        787        4140        2229        2720 

3.2 Correlations

Next, we examine the correlations between our variables, using Pearson’s correlation or point-biserial correlation (a special case of Pearson’s when correlating a continuous and dichotomous variable).

Show the code
#-----------------------------------------------------------------------------
# Create a Correlation Matrix Heatmap ----------------------------------------
#-----------------------------------------------------------------------------

# Convert 'left' to numeric for correlation analysis
employee_retention_data$left_numeric <- as.numeric(as.character(employee_retention_data$left))

# Select relevant continuous variables
correlation_data <- employee_retention_data %>%
  dplyr::select(satisfaction_level, last_evaluation, average_monthly_hours, number_project, time_spend_company, left_numeric)

# Run the correlation matrix with p-values
cor_results <- rcorr(as.matrix(correlation_data))
cor_matrix <- round(cor_results$r, 2)  # Correlation coefficients
p_matrix <- cor_results$P              # P-values

# Convert correlation and p-value matrices to long format
cor_long <- as.data.frame(as.table(cor_matrix))
p_long <- as.data.frame(as.table(p_matrix))
names(cor_long) <- c("x", "y", "value")
names(p_long) <- c("x", "y", "p_value")

# Merge the correlation and p-value data frames
cor_data <- merge(cor_long, p_long, by = c("x", "y"))

# Add significance stars based on p-values
cor_data <- cor_data %>%
  mutate(
    significance = case_when(
      p_value < 0.001 ~ "***",
      p_value < 0.01 ~ "**",
      p_value < 0.05 ~ "*",
      TRUE ~ ""
    ),
    label = paste0(round(value, 2), significance)  # Combine value and significance into one label
  )

# Create the heatmap with highcharter
highchart() %>%
  hc_add_series(data = cor_data, type = "heatmap", hcaes(x = x, y = y, value = value)) %>%
  hc_colorAxis(stops = color_stops(colors = c("#6D9EC1", "white", "#E46726"))) %>%  # Color gradient
  hc_title(text = "Correlation Matrix of Employee Retention Variables") %>%
  hc_tooltip(pointFormat = "{point.x} and {point.y}: {point.label}") %>%  # Tooltip with correlation and stars
  hc_xAxis(categories = colnames(cor_matrix), title = list(text = NULL)) %>%
  hc_yAxis(categories = colnames(cor_matrix), title = list(text = NULL), reversed = TRUE) %>%
  hc_plotOptions(heatmap = list(dataLabels = list(enabled = TRUE, format = '{point.label}')))  # Show labels with stars

Note. *** for p < 0.001, ** for p < 0.01, * for p < 0.05.

  1. Employee Satisfaction and Turnover: The correlation coefficient of -0.39*** indicates a significant negative relationship between employee satisfaction and turnover. This suggests that as employee satisfaction decreases, the likelihood of employees leaving the organization increases.
  2. Number of Projects and Turnover: The correlation coefficient is 0.02**, indicating a significant albeit quite small positive relationship between number of projects and turnover.
  3. Average Monthly Hours and Turnover: The correlation coefficient is 0.07**, indicating a significant albeit quite small positive relationship between the average number of hours worked monthly and turnover.
  4. Time Spent at Company and Turnover: The correlation coefficient is 0.14***, indicating a significant positive relationship.

3.3 Distributions

Correlations can be helpful, but they don’t always tell the full story. To get a better sense of the shape of the data, it can be helpful to inspect it visually. In this section, we produce histograms displaying the distribution of key variables, grouped by turnover status. This can help us get a visual sense of the shape of the data, as well as spot any starting trends.

Show the code
# Set theme_classic() as the default theme

theme_set(
  theme_classic() + 
    theme(legend.position = "right")
  )
Show the code
#-----------------------------------------------------------------------------
# Satisfaction Level by Turnover ---------------------------------------------
#-----------------------------------------------------------------------------

opacity_level <- 0.5 # Set your desired opacity level here

hist_sat <- plot_ly() # Set transparency for overlap
hist_sat <- hist_sat %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 0, ], 
    x = ~satisfaction_level, 
    name = "Stayed",
    xbins = list(start = 0, end = 1, size = 0.05), # Fixed bin size
    marker = list(color = "#00AFBB", 
                  opacity=opacity_level,
                  line = list(color = "#00AFBB", width = 1))
  ) %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 1, ], 
    x = ~satisfaction_level, 
    name = "Left",
    xbins = list(start = 0, end = 1, size = 0.05), # Fixed bin size
    marker = list(color = "#E7B800", 
                  opacity=opacity_level,
                  line = list(color = "#E7B800", width = 1)) 
  ) %>%
  layout(
    barmode = "overlay", # Overlay histograms
    title = "Satisfaction Level by Employee Turnover Status",
    xaxis = list(title = "Satisfaction Level"),
    yaxis = list(title = "Frequency"),
    legend = list(title = list(text = "Turnover Status"))
  )

hist_sat
Show the code
#-----------------------------------------------------------------------------
# Employee Performance by Turnover -------------------------------------------
#-----------------------------------------------------------------------------

hist_perf <- plot_ly() %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 0, ], 
    x = ~last_evaluation, 
    name = "Stayed",
    xbins = list(start = 0, end = 1, size = 0.05), 
    marker = list(color = "#00AFBB", opacity = 0.5, line = list(color = "#00AFBB", width = 1))
  ) %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 1, ], 
    x = ~last_evaluation, 
    name = "Left",
    xbins = list(start = 0, end = 1, size = 0.05), 
    marker = list(color = "#E7B800", opacity = 0.5, line = list(color = "#E7B800", width = 1))
  ) %>%
  layout(
    barmode = "overlay",
    title = "Employee Performance on Last Evaluation by Turnover Status",
    xaxis = list(title = "Performance Rating on Last Evaluation"),
    yaxis = list(title = "Count"),
    legend = list(title = list(text = "Turnover Status"))
  )
hist_perf
Show the code
#-----------------------------------------------------------------------------
# Employee Workload (Monthly Hours) by Turnover ------------------------------
#-----------------------------------------------------------------------------

hist_avg_hours <- plot_ly() %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 0, ], 
    x = ~average_monthly_hours, 
    name = "Stayed",
    xbins = list(start = min(employee_retention_data$average_monthly_hours), 
                 end = max(employee_retention_data$average_monthly_hours), 
                 size = 10), 
    marker = list(color = "#00AFBB", opacity = 0.5, line = list(color = "#00AFBB", width = 1))
  ) %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 1, ], 
    x = ~average_monthly_hours, 
    name = "Left",
    xbins = list(start = min(employee_retention_data$average_monthly_hours), 
                 end = max(employee_retention_data$average_monthly_hours), 
                 size = 10), 
    marker = list(color = "#E7B800", opacity = 0.5, line = list(color = "#E7B800", width = 1))
  ) %>%
  layout(
    barmode = "overlay",
    title = "Employee Workload (Based on Average Monthly Hours) by Turnover Status",
    xaxis = list(title = "Average Monthly Hours"),
    yaxis = list(title = "Count"),
    legend = list(title = list(text = "Turnover Status"))
  )
hist_avg_hours
Show the code
#-----------------------------------------------------------------------------
# Employee Workload (Number of Projects) by Turnover -------------------------
#-----------------------------------------------------------------------------

hist_num_projects <- plot_ly() %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 0, ], 
    x = ~number_project, 
    name = "Stayed",
    xbins = list(start = min(employee_retention_data$number_project), 
                 end = max(employee_retention_data$number_project), 
                 size = 1), 
    marker = list(color = "#00AFBB", opacity = 0.5, line = list(color = "#00AFBB", width = 1))
  ) %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 1, ], 
    x = ~number_project, 
    name = "Left",
    xbins = list(start = min(employee_retention_data$number_project), 
                 end = max(employee_retention_data$number_project), 
                 size = 1), 
    marker = list(color = "#E7B800", opacity = 0.5, line = list(color = "#E7B800", width = 1))
  ) %>%
  layout(
    barmode = "overlay",
    title = "Employee Workload (Based on Number of Projects) by Turnover Status",
    xaxis = list(title = "Number of Projects"),
    yaxis = list(title = "Count"),
    legend = list(title = list(text = "Turnover Status"))
  )
hist_num_projects
Show the code
#-----------------------------------------------------------------------------
# Time Spent at Company by Turnover ------------------------------------------
#-----------------------------------------------------------------------------

hist_time_spent <- plot_ly() %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 0, ], 
    x = ~time_spend_company, 
    name = "Stayed",
    xbins = list(start = min(employee_retention_data$time_spend_company), 
                 end = max(employee_retention_data$time_spend_company), 
                 size = 1), 
    marker = list(color = "#00AFBB", opacity = 0.5, line = list(color = "#00AFBB", width = 1))
  ) %>%
  add_histogram(
    data = employee_retention_data[employee_retention_data$left == 1, ], 
    x = ~time_spend_company, 
    name = "Left",
    xbins = list(start = min(employee_retention_data$time_spend_company), 
                 end = max(employee_retention_data$time_spend_company), 
                 size = 1), 
    marker = list(color = "#E7B800", opacity = 0.5, line = list(color = "#E7B800", width = 1))
  ) %>%
  layout(
    barmode = "overlay",
    title = "Time Spent at Company by Turnover Status",
    xaxis = list(title = "Number of Years at Company"),
    yaxis = list(title = "Count"),
    legend = list(title = list(text = "Turnover Status"))
  )
hist_time_spent
Show the code
#-----------------------------------------------------------------------------
# Employee Workload (Number of Projects) by Turnover -------------------------
#-----------------------------------------------------------------------------

accident_summary <- employee_retention_data %>%
  group_by(work_accident, left) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(work_accident) %>%
  mutate(percentage = count / sum(count) * 100)

# Map levels to meaningful labels
accident_summary <- accident_summary %>%
  mutate(left = factor(left, levels = c("0", "1"), labels = c("Stayed", "Left")))

# Create a bar chart
hist_work_accident <- plot_ly(
  data = accident_summary,
  x = ~work_accident,
  y = ~percentage,
  color = ~left,  # Use the mapped factor for color
  type = "bar",
  colors = c("Stayed" = "#00AFBB", "Left" = "#E7B800"),  # Match color to new labels
  opacity = 0.5,  # Set bar opacity
  marker = list(
    line = list(color = "black", width = 1)  # Add border to bars
  )
) %>%
  layout(
    barmode = "group",  # Set bar mode to grouped
    xaxis = list(title = "Work Accident"),
    yaxis = list(title = "Percentage"),
    legend = list(title = list(text = "Turnover Status"))  # Legend title
  )
hist_work_accident
Show the code
#-----------------------------------------------------------------------------
# Employee Workload (Number of Projects) by Turnover -------------------------
#-----------------------------------------------------------------------------
# Group data by department and left
    department_summary <- employee_retention_data %>%
      group_by(department, left) %>%
      summarise(count = n(), .groups = "drop") %>%
      group_by(department) %>%
      mutate(percentage = count / sum(count) * 100)
    
    # Map levels to meaningful labels
    department_summary <- department_summary %>%
      mutate(left = factor(left, levels = c("0", "1"), labels = c("Stayed", "Left")))
    
    # Create a bar chart
    hist_dep <- plot_ly(
      data = department_summary,
      x = ~department,
      y = ~percentage,
      color = ~left,  # Use the mapped factor for color
      type = "bar",
      colors = c("Stayed" = "#00AFBB", "Left" = "#E7B800"),  # Match color to new labels
      opacity = 0.5,  # Set bar opacity
      marker = list(
        line = list(color = "black", width = 1)  # Add border to bars
      )
    ) %>%
      layout(
        barmode = "group",  # Set bar mode to grouped
        xaxis = list(title = "Department"),
        yaxis = list(title = "Percentage"),
        legend = list(title = list(text = "Turnover Status"))  # Legend title
      )
    hist_dep
Show the code
#-----------------------------------------------------------------------------
# Employee Workload (Number of Projects) by Turnover -------------------------
#-----------------------------------------------------------------------------
# Group data by salary and left
    salary_summary <- employee_retention_data %>%
      group_by(salary, left) %>%
      summarise(count = n(), .groups = "drop") %>%
      group_by(salary) %>%
      mutate(percentage = count / sum(count) * 100)
    
    # Map levels to meaningful labels
    salary_summary <- salary_summary %>%
      mutate(left = factor(left, levels = c("0", "1"), labels = c("Stayed", "Left")))
    
    # Create a bar chart
    hist_salary <- plot_ly(
      data = salary_summary,
      x = ~salary,
      y = ~percentage,
      color = ~left,  # Use the mapped factor for color
      type = "bar",
      colors = c("Stayed" = "#00AFBB", "Left" = "#E7B800"),  # Match color to new labels
      opacity = 0.5,  # Set bar opacity
      marker = list(
        line = list(color = "black", width = 1)  # Add border to bars
      )
    ) %>%
      layout(
        barmode = "group",  # Group the bars
        xaxis = list(title = "Salary Level"),
        yaxis = list(title = "Percentage of Employees"),
        legend = list(title = list(text = "Turnover Status"))  # Custom legend title
      )
    hist_salary
Show the code
#-----------------------------------------------------------------------------
# Employee Workload (Number of Projects) by Turnover -------------------------
#-----------------------------------------------------------------------------

promotion_summary <- employee_retention_data %>%
      group_by(promotion_last_5years, left) %>%
      summarise(count = n(), .groups = "drop") %>%
      group_by(promotion_last_5years) %>%
      mutate(percentage = count / sum(count) * 100)
    
    # Map levels to meaningful labels
    promotion_summary <- promotion_summary %>%
      mutate(left = factor(left, levels = c("0", "1"), labels = c("Stayed", "Left")))
    
    # Create a bar chart
    hist_promotion <- plot_ly(
      data = promotion_summary,
      x = ~promotion_last_5years,
      y = ~percentage,
      color = ~left,  # Use the mapped factor for color
      type = "bar",
      colors = c("Stayed" = "#00AFBB", "Left" = "#E7B800"),  # Match color to new labels
      opacity = 0.5,  # Set bar opacity
      marker = list(
        line = list(color = "black", width = 1)  # Add border to bars
      )
    ) %>%
      layout(
        barmode = "group",  # Set bar mode to grouped
        xaxis = list(title = "Promotion in Last 5 Years"),
        yaxis = list(title = "Percentage"),
        legend = list(title = list(text = "Turnover Status"))  # Legend title
      )
    hist_promotion

Purely based on visual inspection of the histograms:

  1. Satisfaction Level: There may be a difference in satisfaction levels between employees who stayed and those who left. A large proportion of employees who left have satisfaction scores below 0.50. There also appears to be higher and more consistent satisfaction scores among employees who stayed. Overall, this suggests lower satisfaction levels may be related to higher turnover, and this could be an important variable to further examine.

  2. Employee Performance on Last Evaluation: Among employees who left, there appears to be a U-shaped pattern in turnover, with clusters at both lower and higher ends of the performance scale. This “U-shaped” pattern could suggest that turnover may be more common among employees with either low or high performance ratings, while those with mid-range performance ratings (approximately 0.6 to 0.8) may be more likely to stay. However, it’s important to note that there are still a high number of employees who stayed across all performance levels, including those on the lower and upper end. This suggests that performance ratings alone may not capture the full story.

  3. Workload (Based on Average Monthly Hours): Employees who left tend to cluster at both ends of the workload spectrum. Those working very few hours (around 150 or below) and those working very high hours (above around 250) show higher turnover rates. This pattern suggests that both under-utilization and overwork may be associated with a higher likelihood of turnover.

  4. Workload (Based on Number of Projects): Employees who left tend to have either very few projects (e.g., 2) or a high number (6 or 7). The majority of employees who stayed tend to have a mid-range number of projects. This pattern suggests that both under-engagement and over-engagement could relate to turnover.

3.4 Splitting the Data

Before running any predictive models, the first thing we do is split our data into a training dataset and a testing dataset. We’ll randomly assign 70% of the employee rows to the training dataset and the remaining 30% to the test dataset.

We’ll use the training dataset to build and train our models and the testing dataset to evaluate them after they’ve been trained, to see how well they predicts new data they haven’t seen before.

This will help us get a sense of the extent to which the models we’ve build generalize to new data. In other words, it’ll help prevent overfitting (where the model learns the training data too well and fails to generalize to new data) and help us flag possible issues before this model is deployed in real-world settings.

Show the code
set.seed(123)  # For reproducibility

# Define proportions
train_prop <- 0.7
validation_prop <- 0.15
test_prop <- 0.15

# Generate a random grouping vector that assigns each row randomly to either training, validation, or test based on proportions defined above. 
n <- nrow(employee_retention_data)
group <- sample(c("train", "validation", "test"), size = n, replace = TRUE, prob = c(train_prop, validation_prop, test_prop))

# Split the data into three sets
train_data <- employee_retention_data[group == "train", ]
validation_data <- employee_retention_data[group == "validation", ]
test_data <- employee_retention_data[group == "test", ]

# group the validation and training sets for the logistic regression. This is because the MLP makes use of both in its training - we want to logistic regression to have access to the same data. 

train_validation_data <- rbind(train_data, validation_data)

4 Logistic Regression

The first model we try is a relatively simpler logistic regression.

Logistic regression is used to model the relationship between one or more independent variables (predictors) and a binary dependent variable (outcome). Unlike linear regression, which predicts a continuous outcome (like height or salary), logistic regression predicts probabilities for binary outcomes. It estimates the likelihood that an event occurs (e.g., an employee leaving) based on the predictors.

  • Logistic regression works by estimating odds. The odds are a way of expressing the likelihood of an event happening compared to it not happening. For example, if the odds of an employee leaving are 3:1, it means they are three times more likely to leave than they are to not leave (i.e., to stay).

  • Logistic regression uses the logit function to convert these odds into probabilities that range between 0 and 1. This is done using the logistic function, which ensures that the predicted probabilities always fall within this range.

  • The coefficients of the logistic regression indicate how much the log odds of the outcome change with a one-unit increase in the predictor.

4.1 Checking for Nested Data Structure

First, it’s important to note that our data may have a nested structure. Specifically, employees (level 1) may be nested within departments (level 2). Employees within the same department could share characteristics or experiences (e.g., similar working conditions, management style) that could make their data correlated. If the data is nested by department, the assumption of independence of observation in standard logistic regression would be violated and we should be using a multilevel logistic regression. This is important, as patterns can emerge when we look at the data as a whole that may not apply (or could look different) when we look at relationships within groups.

  • Multilevel logistic regression partitions the total variance in the outcome into within-group variance and between-group variance to account for the nested structure of the data. In our case, the within-group variance captures how much the outcome varies between employees within the same department, while the between-group variance captures how much the outcome varies across different departments.

  • In addition to modeling fixed effects (predictors like job satisfaction, hours worked, etc.), multilevel modeling introduces random effects to account for variability in the outcome across groups. The fixed effects represent the average relationship between individual predictors and the outcome across all groups, while the random effects allow for variability in the baseline log-odds of the outcome across groups. This approach adjusts for group-level differences while capturing individual-level effects, enhancing the accuracy and reliability of the estimates.

To check if the nesting is substantial within our data, we can compute the intraclass correlation coefficient (ICC). The ICC is the proportion of the total variance in the outcome that can be attributed to the grouping variable (i.e., departments). It is calculated by dividing the group-level variance (i.e., the variance of the random intercepts) by the total variance (variance of random intercepts + residual variance).

Show the code
# First, we rescale our continuous variables by converting them to Z-scores. 

# Rescale continuous variables
employee_retention_data$satisfaction_level <- scale(employee_retention_data$satisfaction_level)

employee_retention_data$last_evaluation <- scale(employee_retention_data$last_evaluation)

employee_retention_data$average_monthly_hours <- scale(employee_retention_data$average_monthly_hours)

# Fit the model
model <- glmer(left ~ satisfaction_level + last_evaluation + number_project +
               average_monthly_hours + time_spend_company + work_accident +
               promotion_last_5years + salary + (1 | department),
               data = employee_retention_data, family = binomial)

# Extract random effect variance
random_effect_variance <- as.numeric(VarCorr(model)$department[1, 1])
random_effect_variance
[1] 0.04029
Show the code
# Residual variance for logistic regression
residual_variance <- pi^2 / 3

# Calculate ICC
ICC <- random_effect_variance / (random_effect_variance + residual_variance)

ICC
[1] 0.0121

An ICC of 0.0121 suggests that 1.21% of the variance in the outcome (left) is attributable to differences between departments, while the remaining variance is at the individual level. This suggests that there is minimal variation in the outcome across departments. In other words, departmental grouping has a negligible effect on predicting whether an employee leaves. As a result, a simpler logistic regression without random effects will be considered sufficient.

4.2 Assumption Checking for Logistic Regression

  • Assumption 1: Binary Outcome Variable:
    The dependent variable must be binary, such as whether an employee left (1) or stayed (0).

  • Assumption 2: Independence of Observations:
    Each observation should be independent. This is typically ensured through study design (e.g., random sampling). We checked for interdependence among observations within departments (via ICC) and found it not to be a concern.

  • Assumption 3: Linearity in the Logit:
    Continuous predictors should have a linear relationship with the log odds of the outcome (not the outcome itself). If this assumption is violated, transformations or categorization of predictors may be needed.

    • Note: This assumption only applies to continuous predictors because they can take on a range of values, and we want to see if there’s a straight-line trend between those values and the log odds of the outcome. For categorical variables, categories are treated as distinct groups without assumed relationships. The model compares outcomes for each group against a reference group, so linearity does not apply. This means that if the assumption is violated for continuous predictors, we can convert them to categorical variables.

    • To check the linearity assumption, we can plot each predictor against the logit (the log odds) of the outcome. If the linearity assumption is met, we should expect to see a relatively straight line.

Show the code
pacman::p_load(performance, DHARMa)

#-----------------------------------------------------------------------------
# Create the Logistic Regression Model ---------------------------------------
#-----------------------------------------------------------------------------

outcome <- "left_numeric"

predictors <- c("satisfaction_level", "last_evaluation", "average_monthly_hours", 
                "number_project", "time_spend_company", "work_accident",
                "promotion_last_5years",   "salary", "department")

# Train the model on training data
model <- glm(left_numeric ~ 
               satisfaction_level + 
               last_evaluation + 
               average_monthly_hours + 
               number_project + 
               time_spend_company+
               work_accident+
               promotion_last_5years+
               salary+
               department, 
             data = train_validation_data, 
             family = binomial)

#-----------------------------------------------------------------------------
# Extract Pobabilities and Logits --------------------------------------------
#-----------------------------------------------------------------------------

probabilities <- predict(model, type = "response")

logit <- log(probabilities/(1-probabilities))
Show the code
linearity_sat <- ggplot(train_validation_data, aes(logit,satisfaction_level))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_sat
`geom_smooth()` using formula = 'y ~ x'

Show the code
linearity_perf <- ggplot(train_validation_data, aes(logit,last_evaluation))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_perf
`geom_smooth()` using formula = 'y ~ x'

Show the code
linearity_hours <- ggplot(train_validation_data, aes(logit,average_monthly_hours))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_hours
`geom_smooth()` using formula = 'y ~ x'

Show the code
linearity_projects <- ggplot(train_validation_data, aes(logit,number_project))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_projects
`geom_smooth()` using formula = 'y ~ x'

Show the code
linearity_tenure <- ggplot(train_validation_data, aes(logit,time_spend_company))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_tenure
`geom_smooth()` using formula = 'y ~ x'

Show the code
linearity_accident <- ggplot(train_validation_data, aes(logit,work_accident))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_accident
`geom_smooth()` using formula = 'y ~ x'

Show the code
linearity_promotion <- ggplot(train_validation_data, aes(logit,promotion_last_5years))+
  geom_point(size=0.5, alpha=0.5) + 
  geom_smooth(method="loess") +
  theme_bw()

linearity_promotion
`geom_smooth()` using formula = 'y ~ x'

From these plots, we can see that none of the continuous predictors seem to meet the linearity assumption. They all have some kind of curve.

We could try transforming the predictors to see if it helps; but in our case, we’ll just convert them to categorical variables. We’ll do so by dividing each continuous predictor into quartiles and assigning them into categories labeled “Low,” “Low-Mid,” “Upper-Mid,” and “High.”

Show the code
#-----------------------------------------------------------------------------
# Convert Continuous Predictors to Factor Variables --------------------------
#-----------------------------------------------------------------------------

# We'll try categorizing the predictors into low, medium, and high levels and treating them as categorical variables. We'll use quantiles to define cutoffs. 

train_validation_data <- train_validation_data %>%
  mutate(
    satisfaction_level_cat = case_when(
      satisfaction_level <= quantile(satisfaction_level, 0.25) ~ "Low",
      satisfaction_level > quantile(satisfaction_level, 0.25) & satisfaction_level <= quantile(satisfaction_level, 0.50) ~ "Lower-Mid",
      satisfaction_level > quantile(satisfaction_level, 0.50) & satisfaction_level <= quantile(satisfaction_level, 0.75) ~ "Upper-Mid",
      satisfaction_level > quantile(satisfaction_level, 0.75) ~ "High"
    ),
    last_evaluation_cat = case_when(
      last_evaluation <= quantile(last_evaluation, 0.25) ~ "Low",
      last_evaluation > quantile(last_evaluation, 0.25) & last_evaluation <= quantile(last_evaluation, 0.50) ~ "Lower-Mid",
      last_evaluation > quantile(last_evaluation, 0.50) & last_evaluation <= quantile(last_evaluation, 0.75) ~ "Upper-Mid",
      last_evaluation > quantile(last_evaluation, 0.75) ~ "High"
    ),
    average_monthly_hours_cat = case_when(
      average_monthly_hours <= quantile(average_monthly_hours, 0.25) ~ "Low",
      average_monthly_hours > quantile(average_monthly_hours, 0.25) & average_monthly_hours <= quantile(average_monthly_hours, 0.50) ~ "Lower-Mid",
      average_monthly_hours > quantile(average_monthly_hours, 0.50) & average_monthly_hours <= quantile(average_monthly_hours, 0.75) ~ "Upper-Mid",
      average_monthly_hours > quantile(average_monthly_hours, 0.75) ~ "High"
    ),
    number_project_cat = case_when(
      number_project <= quantile(number_project, 0.25) ~ "Low",
      number_project > quantile(number_project, 0.25) & number_project <= quantile(number_project, 0.50) ~ "Lower-Mid",
      number_project > quantile(number_project, 0.50) & number_project <= quantile(number_project, 0.75) ~ "Upper-Mid",
      number_project > quantile(number_project, 0.75) ~ "High"
    ),
    time_spend_company_cat = case_when(
      time_spend_company <= quantile(time_spend_company, 0.25) ~ "Low",
      time_spend_company > quantile(time_spend_company, 0.25) & time_spend_company <= quantile(time_spend_company, 0.50) ~ "Lower-Mid",
      time_spend_company > quantile(time_spend_company, 0.50) & time_spend_company <= quantile(time_spend_company, 0.75) ~ "Upper-Mid",
      time_spend_company > quantile(time_spend_company, 0.75) ~ "High"
    )
  )

# Convert to factors
train_validation_data <- train_validation_data %>%
  mutate(across(ends_with("_cat"), as.factor))
  • Assumption 4: Absence of Multicollinearity:
    Predictors should not be too highly correlated with each other. High multicollinearity can make it difficult to assess the effect of each predictor.
    • This can be checked by looking at the correlation matrix. As a rule of thumb, anything above 0.7 would be too highly correlated. We would remove those from the model since they would be redundant. From our matrix above, we can see that the predictors don’t seem to be too highly correlated.
    • This can also be checked using Variance Inflation Factor (VIF) scores, where VIF values above 5 or 10 indicate a potential multicollinearity problem. As seen below, all the VIF values are below 5, suggesting there is no multicollinearity problem.
    • In the code below, we calculate the VIF for our predictors.
Show the code
# Check the variance inflation factor scores
vif(model)
                       GVIF Df GVIF^(1/(2*Df))
satisfaction_level    1.156  1           1.075
last_evaluation       1.440  1           1.200
average_monthly_hours 1.533  1           1.238
number_project        1.786  1           1.336
time_spend_company    1.119  1           1.058
work_accident         1.013  1           1.006
promotion_last_5years 1.017  1           1.008
salary                1.045  2           1.011
department            1.051  9           1.003
  • Assumption 5: Large Sample Size:
    Logistic regression generally requires a larger sample size to provide reliable estimates. A rule of thumb is to have at least 10 events (e.g., “1” outcomes) per predictor in the model. This helps ensure stability in the estimates.

    • In the code below, we check if the sample size is large enough by calculating the number of cases of turnover divided by the number of predictors.
Show the code
# Count the number of "1" outcomes 
left_count <- sum(employee_retention_data$left_numeric == 1)

# Number of predictors (update as needed)
predictors_count <- length(c("satisfaction_level", "last_evaluation", "average_monthly_hours","number_project", "time_spend_company", "work_accident", "promotion_last_5years", "salary", "department"))

# Divide the number of "1" outcomes by the number of predictors
left_per_predictor <- left_count / predictors_count
left_per_predictor
[1] 396.8
  • To-Check 1: No Important Outliers:
    While not a formal assumption, logistic regression can be sensitive to outliers. Outliers can disproportionately affect the model’s estimates and should be examined before fitting the model.
Show the code
# Reshape data into long format for faceting
numeric_data_long <- employee_retention_data %>%
  dplyr::select(where(is.numeric), -c(promotion_last_5years, work_accident)) %>%
  pivot_longer(cols = everything(), names_to = "variable", values_to = "value")

# Create faceted boxplots
ggplot(numeric_data_long, aes(y = value)) +
  geom_boxplot(fill = "skyblue") +
  facet_wrap(~ variable, scales = "free_y") +
  labs(title = "Box Plots of Numeric Variables", x = "Variable", y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_blank())

  1. Average Monthly Hours:
    • The distribution of monthly hours appears fairly centralized, with no significant outliers.
    • Most employees have monthly hours between 150 and 250, suggesting a generally consistent work schedule across the dataset.
  2. Last Evaluation:
    • Scores for the last evaluation are relatively high, generally clustering between 0.6 and 0.9.
    • No obvious outliers are present, indicating that most employees receive evaluations within a similar range.
  3. Number of Projects:
    • The number of projects ranges from 2 to 7, with a central concentration around 3 to 5 projects.
    • There are no extreme outliers, suggesting that employees’ project loads are fairly consistent.
  4. Satisfaction Level:
    • Satisfaction scores span the full range from 0 to 1, indicating diverse levels of employee satisfaction.
    • The distribution is slightly skewed toward higher satisfaction scores, with no obvious outliers.
  5. Time Spent at the Company:
    • This variable shows some notable outliers, with a few employees spending significantly longer at the company than the majority.
    • Real-World Note: In a real-world scenario, we would investigate these high values to confirm whether they represent actual long-tenured employees or potential data entry errors. For the purposes of this analysis, we’ll assume that these values are valid and reflect long-tenured employees.

4.3 Fitting the Logistic Regression

We can now build our logistic regression model using our training data. We regress the binary outcome, left, on the categorical predictors we created above.

For each categorical variable, R automatically sets one level as the reference category (usually alphabetically first unless specified). All other levels are interpreted relative to this baseline. In our case, R selected the “high” categories as the reference groups.

The results are displayed in terms of odds ratios, which get calculated from the model coefficients. The model outputs log-odds coefficients, but these are not as interpretable as odds ratios, and so we present the odds ratios below.

  • Interpretation: If an odds ratio is above 1, it means an increase in that predictor increases the odds of the outcome (e.g., leaving), while values below 1 suggest a decrease in odds.
Show the code
# Train the model on training data
logistic_model <- glm(left_numeric ~ 
                        satisfaction_level_cat + 
                        last_evaluation_cat +
                        average_monthly_hours_cat + 
                        number_project_cat + 
                        time_spend_company_cat + 
                        work_accident + 
                        promotion_last_5years + 
                        salary + 
                        department,
             data = train_validation_data, 
             family = binomial)

4.3.1 Evaluating Model Fit

AIC (Akaike Information Criterion): This is a commonly used metric for model fit in logistic regression. Lower AIC values indicate a better fit. You can compare AIC values between different models to see which model fits the data better.

Show the code
AIC(logistic_model)
[1] 8434

Pseudo R-squared: Unlike linear regression, there’s no true R-squared for logistic regression, but you can use pseudo R-squared values as an approximation of model fit. Examples include McFadden’s R-squared, Cox & Snell R-squared, and Nagelkerke R-squared.

Show the code
library(pscl)
pR2(logistic_model)
fitting null model for pseudo-r2
       llh    llhNull         G2   McFadden       r2ML       r2CU 
-4188.9720 -6992.0747  5606.2054     0.4009     0.3561     0.5343 

4.3.2 Assessing Training Set Accuracy

For every value, we are predicting a probability of the person leaving: log(odds) = weighted predictors + constant.

  • The log() part of the equation is converting the odds back to a probability, meaning that the output of the model is a guess of the probability that the person will leave.

We then convert these probabilities to binary outcomes (1 = they left, if the probability is greater than 0.5; 0 = they did not leave, if the probability is less than 0.5).

We can evaluate the accuracy by seeing how many outcomes the model got right (i.e., comparing predicted outcomes to actual outcomes). If predicted classes is the same as the actual outcome, it gets counted as 1; else as 0. We calculate the number of outcomes the model got right over the total number of outcomes. This gives us the accuracy of the model on the test set.

We do this to obtain a baseline for comparing the accuracy of the testing dataset. If we get high accuracy on the training set and lower on the test set, it suggests the model does not generalize well to new data (i.e. overfitting). If we have low accuracy on both, it suggests that the model generalizes well but may not be a good model.

Show the code
#-----------------------------------------------------------------------------
# Test Predictions on Training Dataset ----------------------------------------
#-----------------------------------------------------------------------------

# Predict on training data
predictions <- predict(logistic_model, newdata = train_validation_data, type = "response")

# Convert probabilities to binary predictions (e.g., 0.5 threshold)
predicted_outcome <- ifelse(predictions > 0.5, 1, 0)

# Evaluate accuracy
train_accuracy_lr <- round((mean(predicted_outcome == train_validation_data$left_numeric))*100,2)
  • Training set accuracy: 85.24

4.3.3 Assessing Test Set Accuracy

We now test how accurate the model is when predicting the outcome using a new dataset.

Show the code
#-----------------------------------------------------------------------------
# Convert Continuous Predictors to Factor Variables In Testing Dataset -------
#-----------------------------------------------------------------------------

# Note that we have to apply the same quuantiles that we calculated from the training dataset to the testing dataset. 

# Calculate quartiles from training data
satisfaction_level_quantiles <- quantile(train_validation_data$satisfaction_level, probs = c(0.25, 0.50, 0.75))
last_evaluation_quantiles <- quantile(train_validation_data$last_evaluation, probs = c(0.25, 0.50, 0.75))
average_monthly_hours_quantiles <- quantile(train_validation_data$average_monthly_hours, probs = c(0.25, 0.50, 0.75))
number_project_quantiles <- quantile(train_validation_data$number_project, probs = c(0.25, 0.50, 0.75))
time_spend_company_quantiles <- quantile(train_validation_data$time_spend_company, probs = c(0.25, 0.50, 0.75))

# Now we use these to categorize the predictors in the testing dataset.

test_data_lr <- test_data %>%
  mutate(
    satisfaction_level_cat = case_when(
      satisfaction_level <= satisfaction_level_quantiles[1] ~ "Low",
      satisfaction_level > satisfaction_level_quantiles[1] & satisfaction_level <= satisfaction_level_quantiles[2] ~ "Lower-Mid",
      satisfaction_level > satisfaction_level_quantiles[2] & satisfaction_level <= satisfaction_level_quantiles[3] ~ "Upper-Mid",
      satisfaction_level > satisfaction_level_quantiles[3] ~ "High"
    ),
    last_evaluation_cat = case_when(
      last_evaluation <= last_evaluation_quantiles[1] ~ "Low",
      last_evaluation > last_evaluation_quantiles[1] & last_evaluation <= last_evaluation_quantiles[2] ~ "Lower-Mid",
      last_evaluation > last_evaluation_quantiles[2] & last_evaluation <= last_evaluation_quantiles[3] ~ "Upper-Mid",
      last_evaluation > last_evaluation_quantiles[3] ~ "High"
    ),
    average_monthly_hours_cat = case_when(
      average_monthly_hours <= average_monthly_hours_quantiles[1] ~ "Low",
      average_monthly_hours > average_monthly_hours_quantiles[1] & average_monthly_hours <= average_monthly_hours_quantiles[2] ~ "Lower-Mid",
      average_monthly_hours > average_monthly_hours_quantiles[2] & average_monthly_hours <= average_monthly_hours_quantiles[3] ~ "Upper-Mid",
      average_monthly_hours > average_monthly_hours_quantiles[3] ~ "High"
    ),
    number_project_cat = case_when(
      number_project <= number_project_quantiles[1] ~ "Low",
      number_project > number_project_quantiles[1] & number_project <= number_project_quantiles[2] ~ "Lower-Mid",
      number_project > number_project_quantiles[2] & number_project <= number_project_quantiles[3] ~ "Upper-Mid",
      number_project > number_project_quantiles[3] ~ "High"
    ),
    time_spend_company_cat = case_when(
      time_spend_company <= time_spend_company_quantiles[1] ~ "Low",
      time_spend_company > time_spend_company_quantiles[1] & time_spend_company <= time_spend_company_quantiles[2] ~ "Lower-Mid",
      time_spend_company > time_spend_company_quantiles[2] & time_spend_company <= time_spend_company_quantiles[3] ~ "Upper-Mid",
      time_spend_company > time_spend_company_quantiles[3] ~ "High"
    )
  )
# Convert to factors
test_data_lr <- test_data_lr %>%
  mutate(across(ends_with("_cat"), as.factor))

#-----------------------------------------------------------------------------
# Test Predictions on Testing Dataset ----------------------------------------
#-----------------------------------------------------------------------------

# Predict on test data
predictions <- predict(logistic_model, newdata = test_data_lr, type = "response")

# Convert probabilities to binary predictions (e.g., 0.5 threshold)
predicted_outcome <- ifelse(predictions > 0.5, 1, 0)

# Evaluate accuracy
test_accuracy_lr <- round((mean(predicted_outcome == test_data$left_numeric))*100,2)

#-----------------------------------------------------------------------------
# Save Files for Dashboard ---------------------------------------------------
#-----------------------------------------------------------------------------

saveRDS(test_data_lr, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/test_data_LR.rds")
saveRDS(train_validation_data, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/train_data_LR.rds")
saveRDS(logistic_model, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/logistic_model.rds")
  • Test set accuracy: 84.89

4.3.4 Testing Precision, Recall, Sensitivity, and Specificity

Accuracy measures the proportion of correct predictions (both true positives and true negatives) out of all predictions. However, it doesn’t differentiate between the types of errors made, such as false positives (Type 1 Errors) and false negatives (Type 2 Errors). Understanding the rate of these errors is important when evaluating the performance of a model.

  • Precision: Of all the people the model predicted would leave, how many were actually correct?

    • Precision = True Positives / (True Positives + False Positives)
    • High precision indicates a lower rate of Type I errors (missing a risk that exists).
    • High precision is important when the cost of false positives is high (e.g., when retention interventions are resource-intensive).
  • Recall (Sensitivity): Of all the people who actually left, how many did the model correctly identify?

    • Recall = True Positives / (True Positives + False Negatives)
      • Recall = (Number of predictions that are correct) / (Number of predictions that are correct + number of predictions that are wrong)
      • High recall indicates a lower rate of Type II errors (flagging a risk that isn’t there)
      • High recall is essential when the cost of false negatives is high (e.g., losing talent).
  • Specificity: Of all the people who stayed, how many did the model correctly identify?

    • Specificity = True Negatives / True Negatives + False Positives
    • The focus is to avoid false positives/false alarms.
  • There is often a trade-off between precision and recall. Improving one can lead to a decrease in the other. Often, whether we want to maximize one over the other depends on the context. For example, in medical diagnostics, minimizing false negatives (Type II errors) may be relatively more important in terms of consequences for the patient).

  • To achieve a balance, we can use the F1 Score, which is the harmonic mean of precision and recall:

    • F1 = 2 x ((Precision x Sensitivity)/(Precision + Sensitivity)).
Show the code
# Convert probabilities to binary predictions using a 0.5 threshold
predicted_outcome <- ifelse(predictions > 0.5, 1, 0)

# Create a confusion matrix (using the caret package)
conf_matrix <- confusionMatrix(
  factor(predicted_outcome),
  factor(test_data$left_numeric),
  positive = "1"
)

# Extract precision and recall
precision_lr <- conf_matrix$byClass["Pos Pred Value"]
recall_lr <- conf_matrix$byClass["Sensitivity"]
specificity_lr <- conf_matrix$byClass["Specificity"]
f1_score_lr <- conf_matrix$byClass["F1"]

# Print the results
print(paste("Precision:", round(precision_lr, 4)))
[1] "Precision: 0.7097"
Show the code
print(paste("Sensitivity/Recall:", round(recall_lr, 4)))
[1] "Sensitivity/Recall: 0.6145"
Show the code
print(paste("Specificity:", round(specificity_lr, 4)))
[1] "Specificity: 0.9218"
Show the code
print(paste("F1 Score:", round(f1_score_lr, 4)))
[1] "F1 Score: 0.6587"
  • These metrics suggest that approximately 71% of employees predicted to leave do leave. (i.e., false positive rate = 29%).

  • Around 61% of employees who left were correctly identified by the model (false negatives = 39%).

  • The F1 score of 0.66 reflects a moderate balance between precision and sensitivity.

  • In our case, we are likely more concerned with false negatives (Type 2 Errors). Missing individuals who are likely to leave can prevent timely interventions, leading to the loss of talent. The relatively high false negative rate we got in our model could be concerning.


5 Neural Network: Multi-Layer Perceptron

To explore whether a more advanced model can improve accuracy, we’ll now try a neural network.

A neural network is a model inspired by how the human brain works. It consists of “neurons” (also called nodes) connected in layers. In a Multi-Layer Perceptron (MLP), the network is organized into different types of “layers”:

  • Input layer: Receives the input data (e.g., the predictors)

  • Hidden layer(s): Layers between the input and output layers, where the network learns to detect patterns in the data.

  • Output layer: Produces the final prediction. For classification problems, the output layer typically contains neurons representing each class (e.g., “stay” or “leave”), and the network outputs a probability for each class. For regression problems, there’s often a single neuron in the output layer, providing a continuous value.

How Neurons Process Inputs

In a neural network, each “neuron” (or node) in the network receives some inputs, performs calculations, and then decides (1) if the neuron will “fire” (i.e., whether or not to pass its output on to the next layer, and (2) how much of the signal to send.

Each neuron receives inputs from the previous layer (or, in the case of the input layer, the original input data), multiplies each input by a weight that represents its importance, and adds a bias term. The neuron then sums these weighted inputs, and the result is passed through an activation function to produce the neuron’s output, which will be sent to the neurons in the next layer.

Hidden Layers

The hidden layers are where the network learns and detects patterns. These layers are often referred to as the “black box” of the network because we can’t directly observe the patterns they’re learning.

In research terms, the hidden layers essentially act as “mediators”; they take in the inputs and don’t just pass the information to the output, but instead “mediate” by processing, transforming, and combining the inputs to build complex representations that can help make the final prediction.

Just as mediators in research help us understand the process behind a relationship, hidden layers help the MLP capture and represent complex underlying processes in the data. They can be thought of as extracting intermediate representations that explain part of the relationship between input data and output prediction. Each hidden layer builds on the patterns detected by the previous layer, making it possible to capture more complex and non-linear relationships.

You can have more than one hidden layer, effectively creating multiple “mediator steps” in the relationship. Each additional layer allows the network to learn and represent more abstract patterns, enabling it to capture even more complex relationships in the data.

Non-Linearity and Activation Functions

An important part of neural networks is that they don’t assume linearity in the underlying relationship. By stacking multiple layers and using non-linear activation functions, an MLP can capture complex, non-linear relationships in the data, allowing it to model a wider range of patterns.

Activation function: An activation function is a mathematical function applied to the output of each neuron (node) in a neural network layer. After a neuron receives inputs, it calculates a weighted sum (adding together the inputs multiplied by their weights and adding a bias). This weighted sum is then passed through an activation function, which determines the neuron’s output.

The purpose of the activation function is to introduce non-linearity into the model. Without it, the entire network would behave like a single linear transformation, regardless of how many layers it has. By using activation functions, we allow the network to learn and represent non-linear relationships, which are common in real-world data.

Training the Neural Network

Imagine a neural network is making predictions, like guessing whether an employee will stay or leave based on certain inputs. When the network makes a guess, we can compare the guess to the actual answer and see how far off it was. This difference is called the error.

The process of training a neural network involves reducing this error by adjusting the importance (or “weights”) the network assigns to each input feature. This is done through a process called backpropagation. The network works backwards through its layers to figure out which weights contributed most to the error and need adjusting. Then it adjusts the weights to reduce the error.

This process of making predictions, checking errors, and adjusting weights is repeated many times across the dataset. Each time, the network improves slightly, learning which inputs are more important for accurate predictions.

Over many rounds, the network gets better at making predictions because it has fine-tuned the weights, allowing it to minimize the error and better understand the relationship between inputs and the desired output.

5.1 Fitting the Neural Network with Keras

We’ll build a simple model consisting of 4 layers, using the Keras package:

  • The input layer will accept data with 18 features (i.e., our one-hot-encoded predictors). The first hidden layer will contain 64 neurons with the ReLU activation function (to introduce non-linearity to model complex patterns).

  • The second hidden layer will contain 32 neurons, also with ReLU activation.

  • The third hidden layer will contain 8 neurons, with ReLU activation.

  • The output layer contains 1 neuron with a sigmoid activation function (the activation function suitable for binary classification tasks). The sigmoid activation outputs a probability score for the positive class (e.g., whether an employee will leave).

Each layer is fully connected to the previous one, ensuring all neurons contribute to learning.

In the section below, we can see the model training. The loss generally decreases as the model learns to minimize the error between predictions and outcomes. The accuracy generally increases as the model learns to make better predictions.

Show the code
#-----------------------------------------------------------------------------
# Initial Setup --------------------------------------------------------------
#-----------------------------------------------------------------------------

pacman::p_load(keras)

tensorflow::set_random_seed(42) # setting a seed so the random starting values it assigns to the weights in the model are repeatable.

#-----------------------------------------------------------------------------
# Setup the Model ------------------------------------------------------------
#-----------------------------------------------------------------------------

# One-hot encode the categorical predictors
train_data_encoded <- train_data %>%
  mutate(across(c(salary, department), as.factor)) %>%
  model.matrix(~ . - 1, data = .) %>%
  as.matrix()

validation_data_encoded <- validation_data %>%
  mutate(across(c(salary, department), as.factor)) %>%
  model.matrix(~ . - 1, data = .) %>%
  as.matrix()

test_data_encoded <- test_data %>%
  select(-contains("cat")) %>%
  mutate(across(c(salary, department), as.factor)) %>%
  model.matrix(~ . - 1, data = .) %>%
  as.matrix()

# Define the model
model <- keras_model_sequential() %>%
  layer_dense(units = 64, activation = 'relu', input_shape = c(18)) %>%
  layer_dense(units = 32, activation = 'relu') %>%
  layer_dense(units = 8, activation = 'relu') %>%
  layer_dense(units = 1, activation = 'sigmoid')

# Compile the model
model %>% compile(
  optimizer = 'adam',
  loss = 'binary_crossentropy',
  metrics = list('accuracy')
)

#-----------------------------------------------------------------------------
# Add Predictors and Outcome to the Model -------------------------------------
#-----------------------------------------------------------------------------

# Specify predictors 
exclude_columns <- c("left_numeric", "left0", "left1")
predictors <- setdiff(colnames(train_data_encoded), exclude_columns)

# Convert the training dataset to a matrix (keras works with matrices)
mlp_data <- as.matrix(train_data_encoded[, predictors])

# Specify what the outcome is using the training dataset. Called 'labels' in machine learning terms, because the outcome is often data that can be seen as the 'right answer' labelled by humans
labels <- as.matrix(train_data["left_numeric"])

# Do the same two steps above, but for the validation dataset.
validation_inputs <- as.matrix(validation_data_encoded[, predictors])
validation_labels <- as.matrix(as.numeric(validation_data[["left_numeric"]]))

# Create a checkpoint that will save to a file the weights every time the validation loss gets better (smaller).
checkpoint <- callback_model_checkpoint(
  filepath = "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/best_model.h5",  # Path to save the model
  monitor = "val_loss",        # Metric to monitor (validation loss)
  save_best_only = TRUE,       # Only save when the monitored metric improves
  mode = "min"                 # "min" because we want the lowest validation loss
)

#-----------------------------------------------------------------------------
# Train the Model ------------------------------------------------------------
#-----------------------------------------------------------------------------

# Train the model with the checkpoint callback
history <- model %>% fit(
  mlp_data,
  labels,
  epochs = 500,
  batch_size = 32,
  validation_data = list(validation_inputs, validation_labels),
  callbacks = list(checkpoint)  # Call the checkpoint after each epoch
)
Epoch 1/500
330/330 - 2s - loss: 0.9752 - accuracy: 0.7009 - val_loss: 0.5628 - val_accuracy: 0.7617 - 2s/epoch - 6ms/step
Epoch 2/500
330/330 - 1s - loss: 0.5628 - accuracy: 0.7618 - val_loss: 0.5610 - val_accuracy: 0.7617 - 696ms/epoch - 2ms/step
Epoch 3/500
330/330 - 1s - loss: 0.5611 - accuracy: 0.7618 - val_loss: 0.5587 - val_accuracy: 0.7617 - 660ms/epoch - 2ms/step
Epoch 4/500
330/330 - 1s - loss: 0.5580 - accuracy: 0.7618 - val_loss: 0.5550 - val_accuracy: 0.7617 - 795ms/epoch - 2ms/step
Epoch 5/500
330/330 - 1s - loss: 0.5544 - accuracy: 0.7619 - val_loss: 0.5556 - val_accuracy: 0.7617 - 774ms/epoch - 2ms/step
Epoch 6/500
330/330 - 1s - loss: 0.5375 - accuracy: 0.7618 - val_loss: 0.5170 - val_accuracy: 0.7617 - 742ms/epoch - 2ms/step
Epoch 7/500
330/330 - 1s - loss: 0.5102 - accuracy: 0.7612 - val_loss: 0.4776 - val_accuracy: 0.7563 - 753ms/epoch - 2ms/step
Epoch 8/500
330/330 - 1s - loss: 0.4755 - accuracy: 0.7643 - val_loss: 0.4426 - val_accuracy: 0.7763 - 735ms/epoch - 2ms/step
Epoch 9/500
330/330 - 1s - loss: 0.4440 - accuracy: 0.7769 - val_loss: 0.4115 - val_accuracy: 0.7872 - 921ms/epoch - 3ms/step
Epoch 10/500
330/330 - 1s - loss: 0.4157 - accuracy: 0.7921 - val_loss: 0.3885 - val_accuracy: 0.8355 - 776ms/epoch - 2ms/step
Epoch 11/500
330/330 - 1s - loss: 0.3890 - accuracy: 0.8147 - val_loss: 0.3698 - val_accuracy: 0.8027 - 702ms/epoch - 2ms/step
Epoch 12/500
330/330 - 1s - loss: 0.3519 - accuracy: 0.8408 - val_loss: 0.3282 - val_accuracy: 0.8870 - 864ms/epoch - 3ms/step
Epoch 13/500
330/330 - 1s - loss: 0.3297 - accuracy: 0.8620 - val_loss: 0.3347 - val_accuracy: 0.8182 - 698ms/epoch - 2ms/step
Epoch 14/500
330/330 - 1s - loss: 0.3160 - accuracy: 0.8721 - val_loss: 0.2986 - val_accuracy: 0.8884 - 708ms/epoch - 2ms/step
Epoch 15/500
330/330 - 1s - loss: 0.3083 - accuracy: 0.8741 - val_loss: 0.2881 - val_accuracy: 0.8793 - 666ms/epoch - 2ms/step
Epoch 16/500
330/330 - 1s - loss: 0.3027 - accuracy: 0.8797 - val_loss: 0.3053 - val_accuracy: 0.8729 - 597ms/epoch - 2ms/step
Epoch 17/500
330/330 - 1s - loss: 0.2892 - accuracy: 0.8866 - val_loss: 0.2698 - val_accuracy: 0.8911 - 651ms/epoch - 2ms/step
Epoch 18/500
330/330 - 1s - loss: 0.2794 - accuracy: 0.8920 - val_loss: 0.2555 - val_accuracy: 0.9043 - 682ms/epoch - 2ms/step
Epoch 19/500
330/330 - 1s - loss: 0.2742 - accuracy: 0.8961 - val_loss: 0.2662 - val_accuracy: 0.8929 - 694ms/epoch - 2ms/step
Epoch 20/500
330/330 - 1s - loss: 0.2700 - accuracy: 0.8941 - val_loss: 0.2662 - val_accuracy: 0.8957 - 743ms/epoch - 2ms/step
Epoch 21/500
330/330 - 1s - loss: 0.2669 - accuracy: 0.8972 - val_loss: 0.2597 - val_accuracy: 0.8984 - 583ms/epoch - 2ms/step
Epoch 22/500
330/330 - 1s - loss: 0.2595 - accuracy: 0.9010 - val_loss: 0.2528 - val_accuracy: 0.8984 - 614ms/epoch - 2ms/step
Epoch 23/500
330/330 - 1s - loss: 0.2608 - accuracy: 0.8955 - val_loss: 0.2419 - val_accuracy: 0.9016 - 607ms/epoch - 2ms/step
Epoch 24/500
330/330 - 1s - loss: 0.2558 - accuracy: 0.9022 - val_loss: 0.2532 - val_accuracy: 0.9039 - 606ms/epoch - 2ms/step
Epoch 25/500
330/330 - 1s - loss: 0.2653 - accuracy: 0.8943 - val_loss: 0.2829 - val_accuracy: 0.8829 - 624ms/epoch - 2ms/step
Epoch 26/500
330/330 - 1s - loss: 0.2542 - accuracy: 0.9029 - val_loss: 0.2273 - val_accuracy: 0.9121 - 651ms/epoch - 2ms/step
Epoch 27/500
330/330 - 1s - loss: 0.2504 - accuracy: 0.9022 - val_loss: 0.2313 - val_accuracy: 0.9130 - 654ms/epoch - 2ms/step
Epoch 28/500
330/330 - 1s - loss: 0.2472 - accuracy: 0.9033 - val_loss: 0.2318 - val_accuracy: 0.9057 - 649ms/epoch - 2ms/step
Epoch 29/500
330/330 - 1s - loss: 0.2452 - accuracy: 0.9056 - val_loss: 0.2365 - val_accuracy: 0.9084 - 621ms/epoch - 2ms/step
Epoch 30/500
330/330 - 1s - loss: 0.2414 - accuracy: 0.9037 - val_loss: 0.2495 - val_accuracy: 0.9011 - 649ms/epoch - 2ms/step
Epoch 31/500
330/330 - 1s - loss: 0.2477 - accuracy: 0.9033 - val_loss: 0.2197 - val_accuracy: 0.9130 - 723ms/epoch - 2ms/step
Epoch 32/500
330/330 - 1s - loss: 0.2428 - accuracy: 0.9046 - val_loss: 0.2185 - val_accuracy: 0.9166 - 636ms/epoch - 2ms/step
Epoch 33/500
330/330 - 1s - loss: 0.2308 - accuracy: 0.9095 - val_loss: 0.2396 - val_accuracy: 0.9080 - 608ms/epoch - 2ms/step
Epoch 34/500
330/330 - 1s - loss: 0.2325 - accuracy: 0.9105 - val_loss: 0.2145 - val_accuracy: 0.9153 - 704ms/epoch - 2ms/step
Epoch 35/500
330/330 - 1s - loss: 0.2303 - accuracy: 0.9100 - val_loss: 0.2080 - val_accuracy: 0.9212 - 685ms/epoch - 2ms/step
Epoch 36/500
330/330 - 1s - loss: 0.2276 - accuracy: 0.9116 - val_loss: 0.2153 - val_accuracy: 0.9166 - 722ms/epoch - 2ms/step
Epoch 37/500
330/330 - 1s - loss: 0.2286 - accuracy: 0.9140 - val_loss: 0.2029 - val_accuracy: 0.9257 - 676ms/epoch - 2ms/step
Epoch 38/500
330/330 - 1s - loss: 0.2251 - accuracy: 0.9142 - val_loss: 0.2412 - val_accuracy: 0.9039 - 692ms/epoch - 2ms/step
Epoch 39/500
330/330 - 1s - loss: 0.2249 - accuracy: 0.9125 - val_loss: 0.1988 - val_accuracy: 0.9326 - 710ms/epoch - 2ms/step
Epoch 40/500
330/330 - 1s - loss: 0.2140 - accuracy: 0.9185 - val_loss: 0.2086 - val_accuracy: 0.9235 - 658ms/epoch - 2ms/step
Epoch 41/500
330/330 - 1s - loss: 0.2141 - accuracy: 0.9196 - val_loss: 0.1991 - val_accuracy: 0.9271 - 565ms/epoch - 2ms/step
Epoch 42/500
330/330 - 1s - loss: 0.2149 - accuracy: 0.9200 - val_loss: 0.2018 - val_accuracy: 0.9244 - 551ms/epoch - 2ms/step
Epoch 43/500
330/330 - 1s - loss: 0.2154 - accuracy: 0.9194 - val_loss: 0.2734 - val_accuracy: 0.8902 - 593ms/epoch - 2ms/step
Epoch 44/500
330/330 - 1s - loss: 0.2174 - accuracy: 0.9186 - val_loss: 0.2076 - val_accuracy: 0.9235 - 756ms/epoch - 2ms/step
Epoch 45/500
330/330 - 1s - loss: 0.2140 - accuracy: 0.9197 - val_loss: 0.2466 - val_accuracy: 0.9080 - 712ms/epoch - 2ms/step
Epoch 46/500
330/330 - 1s - loss: 0.2092 - accuracy: 0.9228 - val_loss: 0.2094 - val_accuracy: 0.9248 - 578ms/epoch - 2ms/step
Epoch 47/500
330/330 - 1s - loss: 0.2063 - accuracy: 0.9235 - val_loss: 0.1951 - val_accuracy: 0.9289 - 604ms/epoch - 2ms/step
Epoch 48/500
330/330 - 1s - loss: 0.2111 - accuracy: 0.9226 - val_loss: 0.2347 - val_accuracy: 0.9103 - 711ms/epoch - 2ms/step
Epoch 49/500
330/330 - 1s - loss: 0.2035 - accuracy: 0.9259 - val_loss: 0.1877 - val_accuracy: 0.9358 - 669ms/epoch - 2ms/step
Epoch 50/500
330/330 - 1s - loss: 0.2062 - accuracy: 0.9254 - val_loss: 0.2165 - val_accuracy: 0.9171 - 587ms/epoch - 2ms/step
Epoch 51/500
330/330 - 1s - loss: 0.2088 - accuracy: 0.9211 - val_loss: 0.1881 - val_accuracy: 0.9317 - 678ms/epoch - 2ms/step
Epoch 52/500
330/330 - 1s - loss: 0.2023 - accuracy: 0.9266 - val_loss: 0.2039 - val_accuracy: 0.9207 - 639ms/epoch - 2ms/step
Epoch 53/500
330/330 - 1s - loss: 0.2024 - accuracy: 0.9297 - val_loss: 0.1924 - val_accuracy: 0.9294 - 657ms/epoch - 2ms/step
Epoch 54/500
330/330 - 1s - loss: 0.2039 - accuracy: 0.9250 - val_loss: 0.2043 - val_accuracy: 0.9235 - 682ms/epoch - 2ms/step
Epoch 55/500
330/330 - 1s - loss: 0.2007 - accuracy: 0.9272 - val_loss: 0.2091 - val_accuracy: 0.9230 - 724ms/epoch - 2ms/step
Epoch 56/500
330/330 - 1s - loss: 0.2078 - accuracy: 0.9243 - val_loss: 0.2026 - val_accuracy: 0.9308 - 658ms/epoch - 2ms/step
Epoch 57/500
330/330 - 1s - loss: 0.2055 - accuracy: 0.9247 - val_loss: 0.2107 - val_accuracy: 0.9226 - 670ms/epoch - 2ms/step
Epoch 58/500
330/330 - 1s - loss: 0.1987 - accuracy: 0.9268 - val_loss: 0.1920 - val_accuracy: 0.9262 - 647ms/epoch - 2ms/step
Epoch 59/500
330/330 - 1s - loss: 0.1996 - accuracy: 0.9260 - val_loss: 0.2079 - val_accuracy: 0.9244 - 667ms/epoch - 2ms/step
Epoch 60/500
330/330 - 1s - loss: 0.1959 - accuracy: 0.9295 - val_loss: 0.1898 - val_accuracy: 0.9349 - 701ms/epoch - 2ms/step
Epoch 61/500
330/330 - 1s - loss: 0.2068 - accuracy: 0.9252 - val_loss: 0.2116 - val_accuracy: 0.9230 - 745ms/epoch - 2ms/step
Epoch 62/500
330/330 - 1s - loss: 0.1961 - accuracy: 0.9270 - val_loss: 0.1856 - val_accuracy: 0.9308 - 758ms/epoch - 2ms/step
Epoch 63/500
330/330 - 1s - loss: 0.1989 - accuracy: 0.9266 - val_loss: 0.2014 - val_accuracy: 0.9294 - 647ms/epoch - 2ms/step
Epoch 64/500
330/330 - 1s - loss: 0.1951 - accuracy: 0.9292 - val_loss: 0.2113 - val_accuracy: 0.9194 - 684ms/epoch - 2ms/step
Epoch 65/500
330/330 - 1s - loss: 0.1918 - accuracy: 0.9286 - val_loss: 0.1904 - val_accuracy: 0.9312 - 650ms/epoch - 2ms/step
Epoch 66/500
330/330 - 1s - loss: 0.1929 - accuracy: 0.9294 - val_loss: 0.1876 - val_accuracy: 0.9349 - 681ms/epoch - 2ms/step
Epoch 67/500
330/330 - 1s - loss: 0.1928 - accuracy: 0.9294 - val_loss: 0.2129 - val_accuracy: 0.9253 - 730ms/epoch - 2ms/step
Epoch 68/500
330/330 - 1s - loss: 0.2011 - accuracy: 0.9250 - val_loss: 0.1881 - val_accuracy: 0.9298 - 732ms/epoch - 2ms/step
Epoch 69/500
330/330 - 1s - loss: 0.1912 - accuracy: 0.9287 - val_loss: 0.2008 - val_accuracy: 0.9262 - 759ms/epoch - 2ms/step
Epoch 70/500
330/330 - 1s - loss: 0.1900 - accuracy: 0.9294 - val_loss: 0.1953 - val_accuracy: 0.9244 - 692ms/epoch - 2ms/step
Epoch 71/500
330/330 - 1s - loss: 0.1904 - accuracy: 0.9292 - val_loss: 0.2246 - val_accuracy: 0.9185 - 754ms/epoch - 2ms/step
Epoch 72/500
330/330 - 1s - loss: 0.1942 - accuracy: 0.9299 - val_loss: 0.2028 - val_accuracy: 0.9298 - 723ms/epoch - 2ms/step
Epoch 73/500
330/330 - 1s - loss: 0.1889 - accuracy: 0.9306 - val_loss: 0.2082 - val_accuracy: 0.9226 - 605ms/epoch - 2ms/step
Epoch 74/500
330/330 - 1s - loss: 0.1939 - accuracy: 0.9276 - val_loss: 0.1856 - val_accuracy: 0.9344 - 764ms/epoch - 2ms/step
Epoch 75/500
330/330 - 1s - loss: 0.1893 - accuracy: 0.9291 - val_loss: 0.1923 - val_accuracy: 0.9289 - 618ms/epoch - 2ms/step
Epoch 76/500
330/330 - 1s - loss: 0.1928 - accuracy: 0.9297 - val_loss: 0.2031 - val_accuracy: 0.9239 - 605ms/epoch - 2ms/step
Epoch 77/500
330/330 - 1s - loss: 0.1904 - accuracy: 0.9280 - val_loss: 0.1799 - val_accuracy: 0.9344 - 649ms/epoch - 2ms/step
Epoch 78/500
330/330 - 1s - loss: 0.1895 - accuracy: 0.9309 - val_loss: 0.2105 - val_accuracy: 0.9239 - 698ms/epoch - 2ms/step
Epoch 79/500
330/330 - 1s - loss: 0.1949 - accuracy: 0.9262 - val_loss: 0.2160 - val_accuracy: 0.9185 - 597ms/epoch - 2ms/step
Epoch 80/500
330/330 - 1s - loss: 0.1864 - accuracy: 0.9287 - val_loss: 0.1774 - val_accuracy: 0.9353 - 582ms/epoch - 2ms/step
Epoch 81/500
330/330 - 1s - loss: 0.1838 - accuracy: 0.9314 - val_loss: 0.1876 - val_accuracy: 0.9285 - 631ms/epoch - 2ms/step
Epoch 82/500
330/330 - 1s - loss: 0.1828 - accuracy: 0.9322 - val_loss: 0.1758 - val_accuracy: 0.9358 - 709ms/epoch - 2ms/step
Epoch 83/500
330/330 - 1s - loss: 0.1849 - accuracy: 0.9298 - val_loss: 0.1832 - val_accuracy: 0.9326 - 701ms/epoch - 2ms/step
Epoch 84/500
330/330 - 1s - loss: 0.1854 - accuracy: 0.9284 - val_loss: 0.1937 - val_accuracy: 0.9212 - 743ms/epoch - 2ms/step
Epoch 85/500
330/330 - 1s - loss: 0.1854 - accuracy: 0.9313 - val_loss: 0.1757 - val_accuracy: 0.9385 - 647ms/epoch - 2ms/step
Epoch 86/500
330/330 - 1s - loss: 0.1836 - accuracy: 0.9320 - val_loss: 0.1875 - val_accuracy: 0.9276 - 644ms/epoch - 2ms/step
Epoch 87/500
330/330 - 1s - loss: 0.1903 - accuracy: 0.9286 - val_loss: 0.1888 - val_accuracy: 0.9303 - 658ms/epoch - 2ms/step
Epoch 88/500
330/330 - 1s - loss: 0.1813 - accuracy: 0.9315 - val_loss: 0.1838 - val_accuracy: 0.9353 - 656ms/epoch - 2ms/step
Epoch 89/500
330/330 - 1s - loss: 0.1844 - accuracy: 0.9308 - val_loss: 0.1882 - val_accuracy: 0.9321 - 631ms/epoch - 2ms/step
Epoch 90/500
330/330 - 1s - loss: 0.1836 - accuracy: 0.9319 - val_loss: 0.2031 - val_accuracy: 0.9203 - 577ms/epoch - 2ms/step
Epoch 91/500
330/330 - 1s - loss: 0.1806 - accuracy: 0.9335 - val_loss: 0.1734 - val_accuracy: 0.9371 - 686ms/epoch - 2ms/step
Epoch 92/500
330/330 - 1s - loss: 0.1754 - accuracy: 0.9363 - val_loss: 0.1879 - val_accuracy: 0.9276 - 664ms/epoch - 2ms/step
Epoch 93/500
330/330 - 1s - loss: 0.1736 - accuracy: 0.9331 - val_loss: 0.1978 - val_accuracy: 0.9280 - 678ms/epoch - 2ms/step
Epoch 94/500
330/330 - 1s - loss: 0.1737 - accuracy: 0.9339 - val_loss: 0.1822 - val_accuracy: 0.9326 - 548ms/epoch - 2ms/step
Epoch 95/500
330/330 - 1s - loss: 0.1821 - accuracy: 0.9322 - val_loss: 0.1723 - val_accuracy: 0.9394 - 530ms/epoch - 2ms/step
Epoch 96/500
330/330 - 1s - loss: 0.1719 - accuracy: 0.9361 - val_loss: 0.1692 - val_accuracy: 0.9371 - 633ms/epoch - 2ms/step
Epoch 97/500
330/330 - 1s - loss: 0.1796 - accuracy: 0.9309 - val_loss: 0.2025 - val_accuracy: 0.9162 - 667ms/epoch - 2ms/step
Epoch 98/500
330/330 - 1s - loss: 0.1699 - accuracy: 0.9343 - val_loss: 0.1721 - val_accuracy: 0.9326 - 659ms/epoch - 2ms/step
Epoch 99/500
330/330 - 1s - loss: 0.1664 - accuracy: 0.9377 - val_loss: 0.2004 - val_accuracy: 0.9271 - 610ms/epoch - 2ms/step
Epoch 100/500
330/330 - 1s - loss: 0.1722 - accuracy: 0.9338 - val_loss: 0.1957 - val_accuracy: 0.9285 - 667ms/epoch - 2ms/step
Epoch 101/500
330/330 - 1s - loss: 0.1686 - accuracy: 0.9356 - val_loss: 0.2183 - val_accuracy: 0.9248 - 635ms/epoch - 2ms/step
Epoch 102/500
330/330 - 1s - loss: 0.1664 - accuracy: 0.9362 - val_loss: 0.1831 - val_accuracy: 0.9262 - 684ms/epoch - 2ms/step
Epoch 103/500
330/330 - 1s - loss: 0.1737 - accuracy: 0.9347 - val_loss: 0.1940 - val_accuracy: 0.9171 - 652ms/epoch - 2ms/step
Epoch 104/500
330/330 - 1s - loss: 0.1682 - accuracy: 0.9361 - val_loss: 0.1616 - val_accuracy: 0.9440 - 738ms/epoch - 2ms/step
Epoch 105/500
330/330 - 1s - loss: 0.1690 - accuracy: 0.9346 - val_loss: 0.1899 - val_accuracy: 0.9280 - 641ms/epoch - 2ms/step
Epoch 106/500
330/330 - 1s - loss: 0.1660 - accuracy: 0.9367 - val_loss: 0.1593 - val_accuracy: 0.9431 - 771ms/epoch - 2ms/step
Epoch 107/500
330/330 - 1s - loss: 0.1635 - accuracy: 0.9376 - val_loss: 0.1876 - val_accuracy: 0.9253 - 692ms/epoch - 2ms/step
Epoch 108/500
330/330 - 1s - loss: 0.1665 - accuracy: 0.9373 - val_loss: 0.1865 - val_accuracy: 0.9276 - 705ms/epoch - 2ms/step
Epoch 109/500
330/330 - 1s - loss: 0.1659 - accuracy: 0.9360 - val_loss: 0.1842 - val_accuracy: 0.9303 - 653ms/epoch - 2ms/step
Epoch 110/500
330/330 - 1s - loss: 0.1678 - accuracy: 0.9344 - val_loss: 0.1932 - val_accuracy: 0.9257 - 727ms/epoch - 2ms/step
Epoch 111/500
330/330 - 1s - loss: 0.1678 - accuracy: 0.9359 - val_loss: 0.1693 - val_accuracy: 0.9362 - 635ms/epoch - 2ms/step
Epoch 112/500
330/330 - 1s - loss: 0.1658 - accuracy: 0.9374 - val_loss: 0.1829 - val_accuracy: 0.9280 - 691ms/epoch - 2ms/step
Epoch 113/500
330/330 - 1s - loss: 0.1565 - accuracy: 0.9411 - val_loss: 0.1561 - val_accuracy: 0.9453 - 530ms/epoch - 2ms/step
Epoch 114/500
330/330 - 1s - loss: 0.1581 - accuracy: 0.9392 - val_loss: 0.1932 - val_accuracy: 0.9235 - 697ms/epoch - 2ms/step
Epoch 115/500
330/330 - 1s - loss: 0.1575 - accuracy: 0.9388 - val_loss: 0.1753 - val_accuracy: 0.9362 - 679ms/epoch - 2ms/step
Epoch 116/500
330/330 - 1s - loss: 0.1569 - accuracy: 0.9384 - val_loss: 0.1561 - val_accuracy: 0.9421 - 816ms/epoch - 2ms/step
Epoch 117/500
330/330 - 1s - loss: 0.1544 - accuracy: 0.9417 - val_loss: 0.1660 - val_accuracy: 0.9380 - 691ms/epoch - 2ms/step
Epoch 118/500
330/330 - 1s - loss: 0.1576 - accuracy: 0.9411 - val_loss: 0.1716 - val_accuracy: 0.9339 - 706ms/epoch - 2ms/step
Epoch 119/500
330/330 - 1s - loss: 0.1582 - accuracy: 0.9380 - val_loss: 0.1555 - val_accuracy: 0.9440 - 743ms/epoch - 2ms/step
Epoch 120/500
330/330 - 1s - loss: 0.1517 - accuracy: 0.9410 - val_loss: 0.1509 - val_accuracy: 0.9481 - 663ms/epoch - 2ms/step
Epoch 121/500
330/330 - 1s - loss: 0.1533 - accuracy: 0.9423 - val_loss: 0.1758 - val_accuracy: 0.9330 - 674ms/epoch - 2ms/step
Epoch 122/500
330/330 - 1s - loss: 0.1562 - accuracy: 0.9391 - val_loss: 0.1878 - val_accuracy: 0.9289 - 566ms/epoch - 2ms/step
Epoch 123/500
330/330 - 1s - loss: 0.1535 - accuracy: 0.9425 - val_loss: 0.1564 - val_accuracy: 0.9462 - 692ms/epoch - 2ms/step
Epoch 124/500
330/330 - 1s - loss: 0.1521 - accuracy: 0.9418 - val_loss: 0.1486 - val_accuracy: 0.9449 - 724ms/epoch - 2ms/step
Epoch 125/500
330/330 - 1s - loss: 0.1508 - accuracy: 0.9428 - val_loss: 0.1635 - val_accuracy: 0.9339 - 613ms/epoch - 2ms/step
Epoch 126/500
330/330 - 1s - loss: 0.1566 - accuracy: 0.9408 - val_loss: 0.1492 - val_accuracy: 0.9440 - 635ms/epoch - 2ms/step
Epoch 127/500
330/330 - 1s - loss: 0.1517 - accuracy: 0.9431 - val_loss: 0.2354 - val_accuracy: 0.8993 - 657ms/epoch - 2ms/step
Epoch 128/500
330/330 - 1s - loss: 0.1517 - accuracy: 0.9416 - val_loss: 0.1682 - val_accuracy: 0.9312 - 622ms/epoch - 2ms/step
Epoch 129/500
330/330 - 1s - loss: 0.1563 - accuracy: 0.9398 - val_loss: 0.1660 - val_accuracy: 0.9326 - 667ms/epoch - 2ms/step
Epoch 130/500
330/330 - 1s - loss: 0.1497 - accuracy: 0.9435 - val_loss: 0.1483 - val_accuracy: 0.9476 - 707ms/epoch - 2ms/step
Epoch 131/500
330/330 - 1s - loss: 0.1502 - accuracy: 0.9436 - val_loss: 0.1763 - val_accuracy: 0.9362 - 667ms/epoch - 2ms/step
Epoch 132/500
330/330 - 1s - loss: 0.1496 - accuracy: 0.9454 - val_loss: 0.1570 - val_accuracy: 0.9426 - 720ms/epoch - 2ms/step
Epoch 133/500
330/330 - 1s - loss: 0.1468 - accuracy: 0.9434 - val_loss: 0.1542 - val_accuracy: 0.9449 - 767ms/epoch - 2ms/step
Epoch 134/500
330/330 - 1s - loss: 0.1448 - accuracy: 0.9454 - val_loss: 0.1518 - val_accuracy: 0.9403 - 772ms/epoch - 2ms/step
Epoch 135/500
330/330 - 1s - loss: 0.1505 - accuracy: 0.9419 - val_loss: 0.2057 - val_accuracy: 0.9089 - 728ms/epoch - 2ms/step
Epoch 136/500
330/330 - 1s - loss: 0.1449 - accuracy: 0.9453 - val_loss: 0.1506 - val_accuracy: 0.9444 - 816ms/epoch - 2ms/step
Epoch 137/500
330/330 - 1s - loss: 0.1410 - accuracy: 0.9483 - val_loss: 0.1810 - val_accuracy: 0.9321 - 635ms/epoch - 2ms/step
Epoch 138/500
330/330 - 1s - loss: 0.1473 - accuracy: 0.9430 - val_loss: 0.1565 - val_accuracy: 0.9367 - 667ms/epoch - 2ms/step
Epoch 139/500
330/330 - 1s - loss: 0.1477 - accuracy: 0.9440 - val_loss: 0.1435 - val_accuracy: 0.9499 - 668ms/epoch - 2ms/step
Epoch 140/500
330/330 - 1s - loss: 0.1436 - accuracy: 0.9450 - val_loss: 0.1471 - val_accuracy: 0.9449 - 701ms/epoch - 2ms/step
Epoch 141/500
330/330 - 1s - loss: 0.1427 - accuracy: 0.9472 - val_loss: 0.1636 - val_accuracy: 0.9339 - 660ms/epoch - 2ms/step
Epoch 142/500
330/330 - 1s - loss: 0.1411 - accuracy: 0.9484 - val_loss: 0.1425 - val_accuracy: 0.9494 - 763ms/epoch - 2ms/step
Epoch 143/500
330/330 - 1s - loss: 0.1391 - accuracy: 0.9499 - val_loss: 0.1473 - val_accuracy: 0.9435 - 665ms/epoch - 2ms/step
Epoch 144/500
330/330 - 1s - loss: 0.1417 - accuracy: 0.9478 - val_loss: 0.1556 - val_accuracy: 0.9458 - 651ms/epoch - 2ms/step
Epoch 145/500
330/330 - 1s - loss: 0.1471 - accuracy: 0.9455 - val_loss: 0.1583 - val_accuracy: 0.9421 - 699ms/epoch - 2ms/step
Epoch 146/500
330/330 - 1s - loss: 0.1424 - accuracy: 0.9473 - val_loss: 0.1518 - val_accuracy: 0.9426 - 684ms/epoch - 2ms/step
Epoch 147/500
330/330 - 1s - loss: 0.1391 - accuracy: 0.9494 - val_loss: 0.1476 - val_accuracy: 0.9458 - 634ms/epoch - 2ms/step
Epoch 148/500
330/330 - 1s - loss: 0.1377 - accuracy: 0.9506 - val_loss: 0.1368 - val_accuracy: 0.9503 - 635ms/epoch - 2ms/step
Epoch 149/500
330/330 - 1s - loss: 0.1372 - accuracy: 0.9502 - val_loss: 0.1409 - val_accuracy: 0.9490 - 666ms/epoch - 2ms/step
Epoch 150/500
330/330 - 1s - loss: 0.1373 - accuracy: 0.9507 - val_loss: 0.1409 - val_accuracy: 0.9494 - 688ms/epoch - 2ms/step
Epoch 151/500
330/330 - 1s - loss: 0.1433 - accuracy: 0.9473 - val_loss: 0.1502 - val_accuracy: 0.9417 - 653ms/epoch - 2ms/step
Epoch 152/500
330/330 - 1s - loss: 0.1451 - accuracy: 0.9454 - val_loss: 0.1385 - val_accuracy: 0.9485 - 661ms/epoch - 2ms/step
Epoch 153/500
330/330 - 1s - loss: 0.1352 - accuracy: 0.9495 - val_loss: 0.1341 - val_accuracy: 0.9531 - 706ms/epoch - 2ms/step
Epoch 154/500
330/330 - 1s - loss: 0.1367 - accuracy: 0.9495 - val_loss: 0.1543 - val_accuracy: 0.9426 - 682ms/epoch - 2ms/step
Epoch 155/500
330/330 - 1s - loss: 0.1309 - accuracy: 0.9514 - val_loss: 0.1307 - val_accuracy: 0.9540 - 766ms/epoch - 2ms/step
Epoch 156/500
330/330 - 1s - loss: 0.1341 - accuracy: 0.9499 - val_loss: 0.1484 - val_accuracy: 0.9476 - 694ms/epoch - 2ms/step
Epoch 157/500
330/330 - 1s - loss: 0.1333 - accuracy: 0.9525 - val_loss: 0.1487 - val_accuracy: 0.9490 - 668ms/epoch - 2ms/step
Epoch 158/500
330/330 - 1s - loss: 0.1338 - accuracy: 0.9513 - val_loss: 0.1504 - val_accuracy: 0.9435 - 629ms/epoch - 2ms/step
Epoch 159/500
330/330 - 1s - loss: 0.1359 - accuracy: 0.9523 - val_loss: 0.1469 - val_accuracy: 0.9412 - 661ms/epoch - 2ms/step
Epoch 160/500
330/330 - 1s - loss: 0.1344 - accuracy: 0.9527 - val_loss: 0.1323 - val_accuracy: 0.9572 - 657ms/epoch - 2ms/step
Epoch 161/500
330/330 - 1s - loss: 0.1294 - accuracy: 0.9531 - val_loss: 0.1476 - val_accuracy: 0.9444 - 683ms/epoch - 2ms/step
Epoch 162/500
330/330 - 1s - loss: 0.1334 - accuracy: 0.9537 - val_loss: 0.1328 - val_accuracy: 0.9522 - 663ms/epoch - 2ms/step
Epoch 163/500
330/330 - 1s - loss: 0.1342 - accuracy: 0.9509 - val_loss: 0.1356 - val_accuracy: 0.9549 - 720ms/epoch - 2ms/step
Epoch 164/500
330/330 - 1s - loss: 0.1438 - accuracy: 0.9477 - val_loss: 0.1306 - val_accuracy: 0.9549 - 661ms/epoch - 2ms/step
Epoch 165/500
330/330 - 1s - loss: 0.1311 - accuracy: 0.9528 - val_loss: 0.1495 - val_accuracy: 0.9440 - 715ms/epoch - 2ms/step
Epoch 166/500
330/330 - 1s - loss: 0.1284 - accuracy: 0.9540 - val_loss: 0.1474 - val_accuracy: 0.9467 - 620ms/epoch - 2ms/step
Epoch 167/500
330/330 - 1s - loss: 0.1260 - accuracy: 0.9552 - val_loss: 0.1328 - val_accuracy: 0.9535 - 703ms/epoch - 2ms/step
Epoch 168/500
330/330 - 1s - loss: 0.1326 - accuracy: 0.9513 - val_loss: 0.1316 - val_accuracy: 0.9513 - 667ms/epoch - 2ms/step
Epoch 169/500
330/330 - 1s - loss: 0.1349 - accuracy: 0.9488 - val_loss: 0.1358 - val_accuracy: 0.9508 - 679ms/epoch - 2ms/step
Epoch 170/500
330/330 - 1s - loss: 0.1303 - accuracy: 0.9556 - val_loss: 0.1328 - val_accuracy: 0.9540 - 669ms/epoch - 2ms/step
Epoch 171/500
330/330 - 1s - loss: 0.1354 - accuracy: 0.9510 - val_loss: 0.1309 - val_accuracy: 0.9522 - 651ms/epoch - 2ms/step
Epoch 172/500
330/330 - 1s - loss: 0.1350 - accuracy: 0.9508 - val_loss: 0.1435 - val_accuracy: 0.9481 - 698ms/epoch - 2ms/step
Epoch 173/500
330/330 - 1s - loss: 0.1298 - accuracy: 0.9529 - val_loss: 0.1553 - val_accuracy: 0.9449 - 699ms/epoch - 2ms/step
Epoch 174/500
330/330 - 1s - loss: 0.1279 - accuracy: 0.9544 - val_loss: 0.1283 - val_accuracy: 0.9531 - 697ms/epoch - 2ms/step
Epoch 175/500
330/330 - 1s - loss: 0.1265 - accuracy: 0.9553 - val_loss: 0.1342 - val_accuracy: 0.9490 - 711ms/epoch - 2ms/step
Epoch 176/500
330/330 - 1s - loss: 0.1266 - accuracy: 0.9560 - val_loss: 0.1317 - val_accuracy: 0.9572 - 650ms/epoch - 2ms/step
Epoch 177/500
330/330 - 1s - loss: 0.1276 - accuracy: 0.9549 - val_loss: 0.1353 - val_accuracy: 0.9517 - 730ms/epoch - 2ms/step
Epoch 178/500
330/330 - 1s - loss: 0.1272 - accuracy: 0.9550 - val_loss: 0.1282 - val_accuracy: 0.9522 - 715ms/epoch - 2ms/step
Epoch 179/500
330/330 - 1s - loss: 0.1264 - accuracy: 0.9554 - val_loss: 0.1463 - val_accuracy: 0.9517 - 654ms/epoch - 2ms/step
Epoch 180/500
330/330 - 1s - loss: 0.1266 - accuracy: 0.9553 - val_loss: 0.1398 - val_accuracy: 0.9499 - 571ms/epoch - 2ms/step
Epoch 181/500
330/330 - 1s - loss: 0.1245 - accuracy: 0.9546 - val_loss: 0.1815 - val_accuracy: 0.9294 - 700ms/epoch - 2ms/step
Epoch 182/500
330/330 - 1s - loss: 0.1316 - accuracy: 0.9520 - val_loss: 0.1286 - val_accuracy: 0.9517 - 790ms/epoch - 2ms/step
Epoch 183/500
330/330 - 1s - loss: 0.1247 - accuracy: 0.9556 - val_loss: 0.1254 - val_accuracy: 0.9581 - 781ms/epoch - 2ms/step
Epoch 184/500
330/330 - 1s - loss: 0.1236 - accuracy: 0.9559 - val_loss: 0.1311 - val_accuracy: 0.9526 - 658ms/epoch - 2ms/step
Epoch 185/500
330/330 - 1s - loss: 0.1280 - accuracy: 0.9539 - val_loss: 0.1494 - val_accuracy: 0.9449 - 643ms/epoch - 2ms/step
Epoch 186/500
330/330 - 1s - loss: 0.1190 - accuracy: 0.9581 - val_loss: 0.1430 - val_accuracy: 0.9554 - 642ms/epoch - 2ms/step
Epoch 187/500
330/330 - 1s - loss: 0.1239 - accuracy: 0.9575 - val_loss: 0.1365 - val_accuracy: 0.9508 - 616ms/epoch - 2ms/step
Epoch 188/500
330/330 - 1s - loss: 0.1236 - accuracy: 0.9573 - val_loss: 0.1342 - val_accuracy: 0.9485 - 635ms/epoch - 2ms/step
Epoch 189/500
330/330 - 1s - loss: 0.1219 - accuracy: 0.9559 - val_loss: 0.1582 - val_accuracy: 0.9385 - 716ms/epoch - 2ms/step
Epoch 190/500
330/330 - 1s - loss: 0.1232 - accuracy: 0.9560 - val_loss: 0.1367 - val_accuracy: 0.9467 - 778ms/epoch - 2ms/step
Epoch 191/500
330/330 - 1s - loss: 0.1234 - accuracy: 0.9577 - val_loss: 0.1344 - val_accuracy: 0.9503 - 869ms/epoch - 3ms/step
Epoch 192/500
330/330 - 1s - loss: 0.1246 - accuracy: 0.9558 - val_loss: 0.1386 - val_accuracy: 0.9513 - 879ms/epoch - 3ms/step
Epoch 193/500
330/330 - 1s - loss: 0.1222 - accuracy: 0.9567 - val_loss: 0.1415 - val_accuracy: 0.9494 - 839ms/epoch - 3ms/step
Epoch 194/500
330/330 - 1s - loss: 0.1267 - accuracy: 0.9553 - val_loss: 0.1612 - val_accuracy: 0.9385 - 1s/epoch - 4ms/step
Epoch 195/500
330/330 - 1s - loss: 0.1243 - accuracy: 0.9568 - val_loss: 0.1237 - val_accuracy: 0.9535 - 835ms/epoch - 3ms/step
Epoch 196/500
330/330 - 1s - loss: 0.1203 - accuracy: 0.9565 - val_loss: 0.1279 - val_accuracy: 0.9517 - 1s/epoch - 3ms/step
Epoch 197/500
330/330 - 1s - loss: 0.1238 - accuracy: 0.9564 - val_loss: 0.1454 - val_accuracy: 0.9453 - 991ms/epoch - 3ms/step
Epoch 198/500
330/330 - 1s - loss: 0.1243 - accuracy: 0.9553 - val_loss: 0.1338 - val_accuracy: 0.9526 - 876ms/epoch - 3ms/step
Epoch 199/500
330/330 - 1s - loss: 0.1179 - accuracy: 0.9588 - val_loss: 0.1490 - val_accuracy: 0.9481 - 852ms/epoch - 3ms/step
Epoch 200/500
330/330 - 1s - loss: 0.1178 - accuracy: 0.9593 - val_loss: 0.1363 - val_accuracy: 0.9494 - 824ms/epoch - 2ms/step
Epoch 201/500
330/330 - 1s - loss: 0.1175 - accuracy: 0.9599 - val_loss: 0.1315 - val_accuracy: 0.9608 - 875ms/epoch - 3ms/step
Epoch 202/500
330/330 - 1s - loss: 0.1211 - accuracy: 0.9569 - val_loss: 0.1523 - val_accuracy: 0.9476 - 708ms/epoch - 2ms/step
Epoch 203/500
330/330 - 1s - loss: 0.1228 - accuracy: 0.9571 - val_loss: 0.1560 - val_accuracy: 0.9440 - 799ms/epoch - 2ms/step
Epoch 204/500
330/330 - 1s - loss: 0.1232 - accuracy: 0.9577 - val_loss: 0.1264 - val_accuracy: 0.9590 - 791ms/epoch - 2ms/step
Epoch 205/500
330/330 - 1s - loss: 0.1151 - accuracy: 0.9606 - val_loss: 0.1310 - val_accuracy: 0.9567 - 740ms/epoch - 2ms/step
Epoch 206/500
330/330 - 1s - loss: 0.1216 - accuracy: 0.9589 - val_loss: 0.1306 - val_accuracy: 0.9590 - 744ms/epoch - 2ms/step
Epoch 207/500
330/330 - 1s - loss: 0.1224 - accuracy: 0.9560 - val_loss: 0.1450 - val_accuracy: 0.9453 - 745ms/epoch - 2ms/step
Epoch 208/500
330/330 - 1s - loss: 0.1197 - accuracy: 0.9580 - val_loss: 0.1230 - val_accuracy: 0.9595 - 813ms/epoch - 2ms/step
Epoch 209/500
330/330 - 1s - loss: 0.1137 - accuracy: 0.9623 - val_loss: 0.1514 - val_accuracy: 0.9503 - 871ms/epoch - 3ms/step
Epoch 210/500
330/330 - 1s - loss: 0.1179 - accuracy: 0.9598 - val_loss: 0.1411 - val_accuracy: 0.9490 - 790ms/epoch - 2ms/step
Epoch 211/500
330/330 - 1s - loss: 0.1173 - accuracy: 0.9598 - val_loss: 0.1385 - val_accuracy: 0.9563 - 683ms/epoch - 2ms/step
Epoch 212/500
330/330 - 1s - loss: 0.1156 - accuracy: 0.9604 - val_loss: 0.1712 - val_accuracy: 0.9385 - 738ms/epoch - 2ms/step
Epoch 213/500
330/330 - 1s - loss: 0.1167 - accuracy: 0.9609 - val_loss: 0.2043 - val_accuracy: 0.9216 - 660ms/epoch - 2ms/step
Epoch 214/500
330/330 - 1s - loss: 0.1226 - accuracy: 0.9565 - val_loss: 0.1421 - val_accuracy: 0.9467 - 944ms/epoch - 3ms/step
Epoch 215/500
330/330 - 1s - loss: 0.1203 - accuracy: 0.9583 - val_loss: 0.1373 - val_accuracy: 0.9499 - 713ms/epoch - 2ms/step
Epoch 216/500
330/330 - 1s - loss: 0.1155 - accuracy: 0.9609 - val_loss: 0.1289 - val_accuracy: 0.9544 - 754ms/epoch - 2ms/step
Epoch 217/500
330/330 - 1s - loss: 0.1186 - accuracy: 0.9605 - val_loss: 0.1393 - val_accuracy: 0.9494 - 978ms/epoch - 3ms/step
Epoch 218/500
330/330 - 1s - loss: 0.1198 - accuracy: 0.9569 - val_loss: 0.1257 - val_accuracy: 0.9595 - 957ms/epoch - 3ms/step
Epoch 219/500
330/330 - 1s - loss: 0.1162 - accuracy: 0.9606 - val_loss: 0.1370 - val_accuracy: 0.9522 - 780ms/epoch - 2ms/step
Epoch 220/500
330/330 - 1s - loss: 0.1147 - accuracy: 0.9603 - val_loss: 0.1341 - val_accuracy: 0.9508 - 853ms/epoch - 3ms/step
Epoch 221/500
330/330 - 1s - loss: 0.1144 - accuracy: 0.9602 - val_loss: 0.1468 - val_accuracy: 0.9499 - 794ms/epoch - 2ms/step
Epoch 222/500
330/330 - 1s - loss: 0.1189 - accuracy: 0.9582 - val_loss: 0.1242 - val_accuracy: 0.9608 - 760ms/epoch - 2ms/step
Epoch 223/500
330/330 - 1s - loss: 0.1172 - accuracy: 0.9585 - val_loss: 0.1411 - val_accuracy: 0.9544 - 827ms/epoch - 3ms/step
Epoch 224/500
330/330 - 1s - loss: 0.1204 - accuracy: 0.9565 - val_loss: 0.1450 - val_accuracy: 0.9485 - 725ms/epoch - 2ms/step
Epoch 225/500
330/330 - 1s - loss: 0.1163 - accuracy: 0.9576 - val_loss: 0.1270 - val_accuracy: 0.9567 - 729ms/epoch - 2ms/step
Epoch 226/500
330/330 - 1s - loss: 0.1153 - accuracy: 0.9595 - val_loss: 0.1240 - val_accuracy: 0.9540 - 830ms/epoch - 3ms/step
Epoch 227/500
330/330 - 1s - loss: 0.1139 - accuracy: 0.9591 - val_loss: 0.1347 - val_accuracy: 0.9540 - 942ms/epoch - 3ms/step
Epoch 228/500
330/330 - 1s - loss: 0.1154 - accuracy: 0.9616 - val_loss: 0.1393 - val_accuracy: 0.9513 - 770ms/epoch - 2ms/step
Epoch 229/500
330/330 - 1s - loss: 0.1170 - accuracy: 0.9593 - val_loss: 0.1239 - val_accuracy: 0.9608 - 735ms/epoch - 2ms/step
Epoch 230/500
330/330 - 1s - loss: 0.1128 - accuracy: 0.9602 - val_loss: 0.1265 - val_accuracy: 0.9595 - 739ms/epoch - 2ms/step
Epoch 231/500
330/330 - 1s - loss: 0.1143 - accuracy: 0.9605 - val_loss: 0.1451 - val_accuracy: 0.9494 - 777ms/epoch - 2ms/step
Epoch 232/500
330/330 - 1s - loss: 0.1164 - accuracy: 0.9598 - val_loss: 0.1334 - val_accuracy: 0.9513 - 825ms/epoch - 2ms/step
Epoch 233/500
330/330 - 1s - loss: 0.1126 - accuracy: 0.9603 - val_loss: 0.1406 - val_accuracy: 0.9540 - 746ms/epoch - 2ms/step
Epoch 234/500
330/330 - 1s - loss: 0.1130 - accuracy: 0.9602 - val_loss: 0.1270 - val_accuracy: 0.9585 - 712ms/epoch - 2ms/step
Epoch 235/500
330/330 - 1s - loss: 0.1108 - accuracy: 0.9621 - val_loss: 0.1419 - val_accuracy: 0.9485 - 753ms/epoch - 2ms/step
Epoch 236/500
330/330 - 1s - loss: 0.1141 - accuracy: 0.9602 - val_loss: 0.1342 - val_accuracy: 0.9508 - 765ms/epoch - 2ms/step
Epoch 237/500
330/330 - 1s - loss: 0.1111 - accuracy: 0.9623 - val_loss: 0.1544 - val_accuracy: 0.9467 - 763ms/epoch - 2ms/step
Epoch 238/500
330/330 - 1s - loss: 0.1151 - accuracy: 0.9609 - val_loss: 0.1238 - val_accuracy: 0.9590 - 788ms/epoch - 2ms/step
Epoch 239/500
330/330 - 1s - loss: 0.1116 - accuracy: 0.9606 - val_loss: 0.1226 - val_accuracy: 0.9595 - 791ms/epoch - 2ms/step
Epoch 240/500
330/330 - 1s - loss: 0.1121 - accuracy: 0.9620 - val_loss: 0.1259 - val_accuracy: 0.9549 - 779ms/epoch - 2ms/step
Epoch 241/500
330/330 - 1s - loss: 0.1135 - accuracy: 0.9610 - val_loss: 0.1361 - val_accuracy: 0.9485 - 922ms/epoch - 3ms/step
Epoch 242/500
330/330 - 1s - loss: 0.1164 - accuracy: 0.9590 - val_loss: 0.1310 - val_accuracy: 0.9508 - 817ms/epoch - 2ms/step
Epoch 243/500
330/330 - 1s - loss: 0.1120 - accuracy: 0.9625 - val_loss: 0.1283 - val_accuracy: 0.9581 - 828ms/epoch - 3ms/step
Epoch 244/500
330/330 - 1s - loss: 0.1100 - accuracy: 0.9629 - val_loss: 0.1278 - val_accuracy: 0.9576 - 760ms/epoch - 2ms/step
Epoch 245/500
330/330 - 1s - loss: 0.1110 - accuracy: 0.9621 - val_loss: 0.1246 - val_accuracy: 0.9558 - 795ms/epoch - 2ms/step
Epoch 246/500
330/330 - 1s - loss: 0.1095 - accuracy: 0.9620 - val_loss: 0.1378 - val_accuracy: 0.9567 - 763ms/epoch - 2ms/step
Epoch 247/500
330/330 - 1s - loss: 0.1161 - accuracy: 0.9585 - val_loss: 0.1215 - val_accuracy: 0.9572 - 810ms/epoch - 2ms/step
Epoch 248/500
330/330 - 1s - loss: 0.1118 - accuracy: 0.9595 - val_loss: 0.1195 - val_accuracy: 0.9563 - 843ms/epoch - 3ms/step
Epoch 249/500
330/330 - 1s - loss: 0.1098 - accuracy: 0.9637 - val_loss: 0.1255 - val_accuracy: 0.9526 - 691ms/epoch - 2ms/step
Epoch 250/500
330/330 - 1s - loss: 0.1137 - accuracy: 0.9620 - val_loss: 0.1316 - val_accuracy: 0.9526 - 715ms/epoch - 2ms/step
Epoch 251/500
330/330 - 1s - loss: 0.1094 - accuracy: 0.9604 - val_loss: 0.1239 - val_accuracy: 0.9581 - 723ms/epoch - 2ms/step
Epoch 252/500
330/330 - 1s - loss: 0.1134 - accuracy: 0.9615 - val_loss: 0.1275 - val_accuracy: 0.9608 - 807ms/epoch - 2ms/step
Epoch 253/500
330/330 - 1s - loss: 0.1139 - accuracy: 0.9601 - val_loss: 0.1239 - val_accuracy: 0.9595 - 641ms/epoch - 2ms/step
Epoch 254/500
330/330 - 1s - loss: 0.1076 - accuracy: 0.9635 - val_loss: 0.1238 - val_accuracy: 0.9549 - 812ms/epoch - 2ms/step
Epoch 255/500
330/330 - 1s - loss: 0.1124 - accuracy: 0.9628 - val_loss: 0.1493 - val_accuracy: 0.9467 - 707ms/epoch - 2ms/step
Epoch 256/500
330/330 - 1s - loss: 0.1142 - accuracy: 0.9613 - val_loss: 0.1322 - val_accuracy: 0.9535 - 699ms/epoch - 2ms/step
Epoch 257/500
330/330 - 1s - loss: 0.1056 - accuracy: 0.9655 - val_loss: 0.1514 - val_accuracy: 0.9467 - 667ms/epoch - 2ms/step
Epoch 258/500
330/330 - 1s - loss: 0.1094 - accuracy: 0.9621 - val_loss: 0.1261 - val_accuracy: 0.9595 - 669ms/epoch - 2ms/step
Epoch 259/500
330/330 - 1s - loss: 0.1074 - accuracy: 0.9635 - val_loss: 0.1294 - val_accuracy: 0.9576 - 710ms/epoch - 2ms/step
Epoch 260/500
330/330 - 1s - loss: 0.1114 - accuracy: 0.9617 - val_loss: 0.1344 - val_accuracy: 0.9540 - 838ms/epoch - 3ms/step
Epoch 261/500
330/330 - 1s - loss: 0.1055 - accuracy: 0.9645 - val_loss: 0.1395 - val_accuracy: 0.9572 - 734ms/epoch - 2ms/step
Epoch 262/500
330/330 - 1s - loss: 0.1100 - accuracy: 0.9628 - val_loss: 0.1348 - val_accuracy: 0.9517 - 861ms/epoch - 3ms/step
Epoch 263/500
330/330 - 1s - loss: 0.1099 - accuracy: 0.9620 - val_loss: 0.1266 - val_accuracy: 0.9631 - 828ms/epoch - 3ms/step
Epoch 264/500
330/330 - 1s - loss: 0.1118 - accuracy: 0.9622 - val_loss: 0.1263 - val_accuracy: 0.9549 - 849ms/epoch - 3ms/step
Epoch 265/500
330/330 - 1s - loss: 0.1057 - accuracy: 0.9651 - val_loss: 0.1293 - val_accuracy: 0.9576 - 767ms/epoch - 2ms/step
Epoch 266/500
330/330 - 1s - loss: 0.1101 - accuracy: 0.9628 - val_loss: 0.1213 - val_accuracy: 0.9585 - 794ms/epoch - 2ms/step
Epoch 267/500
330/330 - 1s - loss: 0.1085 - accuracy: 0.9641 - val_loss: 0.1543 - val_accuracy: 0.9408 - 809ms/epoch - 2ms/step
Epoch 268/500
330/330 - 1s - loss: 0.1076 - accuracy: 0.9644 - val_loss: 0.1361 - val_accuracy: 0.9494 - 800ms/epoch - 2ms/step
Epoch 269/500
330/330 - 1s - loss: 0.1117 - accuracy: 0.9617 - val_loss: 0.1252 - val_accuracy: 0.9595 - 920ms/epoch - 3ms/step
Epoch 270/500
330/330 - 1s - loss: 0.1078 - accuracy: 0.9628 - val_loss: 0.1152 - val_accuracy: 0.9599 - 761ms/epoch - 2ms/step
Epoch 271/500
330/330 - 1s - loss: 0.1060 - accuracy: 0.9649 - val_loss: 0.1458 - val_accuracy: 0.9508 - 780ms/epoch - 2ms/step
Epoch 272/500
330/330 - 1s - loss: 0.1136 - accuracy: 0.9602 - val_loss: 0.1456 - val_accuracy: 0.9549 - 718ms/epoch - 2ms/step
Epoch 273/500
330/330 - 1s - loss: 0.1079 - accuracy: 0.9640 - val_loss: 0.1242 - val_accuracy: 0.9590 - 732ms/epoch - 2ms/step
Epoch 274/500
330/330 - 1s - loss: 0.1123 - accuracy: 0.9619 - val_loss: 0.1290 - val_accuracy: 0.9563 - 745ms/epoch - 2ms/step
Epoch 275/500
330/330 - 1s - loss: 0.1046 - accuracy: 0.9657 - val_loss: 0.1227 - val_accuracy: 0.9558 - 731ms/epoch - 2ms/step
Epoch 276/500
330/330 - 1s - loss: 0.1075 - accuracy: 0.9630 - val_loss: 0.1220 - val_accuracy: 0.9576 - 740ms/epoch - 2ms/step
Epoch 277/500
330/330 - 1s - loss: 0.1048 - accuracy: 0.9659 - val_loss: 0.1299 - val_accuracy: 0.9604 - 748ms/epoch - 2ms/step
Epoch 278/500
330/330 - 1s - loss: 0.1058 - accuracy: 0.9641 - val_loss: 0.1351 - val_accuracy: 0.9549 - 690ms/epoch - 2ms/step
Epoch 279/500
330/330 - 1s - loss: 0.1086 - accuracy: 0.9640 - val_loss: 0.1271 - val_accuracy: 0.9558 - 800ms/epoch - 2ms/step
Epoch 280/500
330/330 - 1s - loss: 0.1061 - accuracy: 0.9642 - val_loss: 0.1276 - val_accuracy: 0.9581 - 686ms/epoch - 2ms/step
Epoch 281/500
330/330 - 1s - loss: 0.1055 - accuracy: 0.9649 - val_loss: 0.1205 - val_accuracy: 0.9617 - 773ms/epoch - 2ms/step
Epoch 282/500
330/330 - 1s - loss: 0.1053 - accuracy: 0.9627 - val_loss: 0.1372 - val_accuracy: 0.9499 - 845ms/epoch - 3ms/step
Epoch 283/500
330/330 - 1s - loss: 0.1060 - accuracy: 0.9627 - val_loss: 0.1296 - val_accuracy: 0.9517 - 772ms/epoch - 2ms/step
Epoch 284/500
330/330 - 1s - loss: 0.1106 - accuracy: 0.9639 - val_loss: 0.1254 - val_accuracy: 0.9549 - 917ms/epoch - 3ms/step
Epoch 285/500
330/330 - 1s - loss: 0.1036 - accuracy: 0.9655 - val_loss: 0.1194 - val_accuracy: 0.9636 - 771ms/epoch - 2ms/step
Epoch 286/500
330/330 - 1s - loss: 0.1078 - accuracy: 0.9630 - val_loss: 0.1250 - val_accuracy: 0.9613 - 806ms/epoch - 2ms/step
Epoch 287/500
330/330 - 1s - loss: 0.1011 - accuracy: 0.9665 - val_loss: 0.1286 - val_accuracy: 0.9554 - 886ms/epoch - 3ms/step
Epoch 288/500
330/330 - 1s - loss: 0.1079 - accuracy: 0.9616 - val_loss: 0.1198 - val_accuracy: 0.9604 - 815ms/epoch - 2ms/step
Epoch 289/500
330/330 - 1s - loss: 0.1019 - accuracy: 0.9660 - val_loss: 0.1187 - val_accuracy: 0.9590 - 759ms/epoch - 2ms/step
Epoch 290/500
330/330 - 1s - loss: 0.1083 - accuracy: 0.9650 - val_loss: 0.1489 - val_accuracy: 0.9522 - 766ms/epoch - 2ms/step
Epoch 291/500
330/330 - 1s - loss: 0.1115 - accuracy: 0.9615 - val_loss: 0.1270 - val_accuracy: 0.9585 - 692ms/epoch - 2ms/step
Epoch 292/500
330/330 - 1s - loss: 0.1039 - accuracy: 0.9663 - val_loss: 0.1271 - val_accuracy: 0.9563 - 813ms/epoch - 2ms/step
Epoch 293/500
330/330 - 1s - loss: 0.1039 - accuracy: 0.9661 - val_loss: 0.1426 - val_accuracy: 0.9444 - 962ms/epoch - 3ms/step
Epoch 294/500
330/330 - 1s - loss: 0.1034 - accuracy: 0.9648 - val_loss: 0.1444 - val_accuracy: 0.9531 - 865ms/epoch - 3ms/step
Epoch 295/500
330/330 - 1s - loss: 0.1100 - accuracy: 0.9617 - val_loss: 0.1314 - val_accuracy: 0.9526 - 877ms/epoch - 3ms/step
Epoch 296/500
330/330 - 1s - loss: 0.1061 - accuracy: 0.9646 - val_loss: 0.1505 - val_accuracy: 0.9476 - 896ms/epoch - 3ms/step
Epoch 297/500
330/330 - 1s - loss: 0.1063 - accuracy: 0.9640 - val_loss: 0.1693 - val_accuracy: 0.9362 - 815ms/epoch - 2ms/step
Epoch 298/500
330/330 - 1s - loss: 0.1039 - accuracy: 0.9652 - val_loss: 0.1210 - val_accuracy: 0.9572 - 736ms/epoch - 2ms/step
Epoch 299/500
330/330 - 1s - loss: 0.1047 - accuracy: 0.9652 - val_loss: 0.1273 - val_accuracy: 0.9535 - 762ms/epoch - 2ms/step
Epoch 300/500
330/330 - 1s - loss: 0.1040 - accuracy: 0.9642 - val_loss: 0.1248 - val_accuracy: 0.9531 - 820ms/epoch - 2ms/step
Epoch 301/500
330/330 - 1s - loss: 0.1051 - accuracy: 0.9643 - val_loss: 0.1231 - val_accuracy: 0.9595 - 702ms/epoch - 2ms/step
Epoch 302/500
330/330 - 1s - loss: 0.1025 - accuracy: 0.9651 - val_loss: 0.1310 - val_accuracy: 0.9526 - 745ms/epoch - 2ms/step
Epoch 303/500
330/330 - 1s - loss: 0.1031 - accuracy: 0.9639 - val_loss: 0.1249 - val_accuracy: 0.9608 - 789ms/epoch - 2ms/step
Epoch 304/500
330/330 - 1s - loss: 0.1069 - accuracy: 0.9633 - val_loss: 0.1359 - val_accuracy: 0.9513 - 857ms/epoch - 3ms/step
Epoch 305/500
330/330 - 1s - loss: 0.1041 - accuracy: 0.9656 - val_loss: 0.1265 - val_accuracy: 0.9558 - 813ms/epoch - 2ms/step
Epoch 306/500
330/330 - 1s - loss: 0.1024 - accuracy: 0.9658 - val_loss: 0.1275 - val_accuracy: 0.9590 - 825ms/epoch - 3ms/step
Epoch 307/500
330/330 - 1s - loss: 0.1066 - accuracy: 0.9633 - val_loss: 0.1272 - val_accuracy: 0.9581 - 800ms/epoch - 2ms/step
Epoch 308/500
330/330 - 1s - loss: 0.1013 - accuracy: 0.9670 - val_loss: 0.1145 - val_accuracy: 0.9622 - 850ms/epoch - 3ms/step
Epoch 309/500
330/330 - 1s - loss: 0.1028 - accuracy: 0.9655 - val_loss: 0.1224 - val_accuracy: 0.9581 - 646ms/epoch - 2ms/step
Epoch 310/500
330/330 - 1s - loss: 0.0995 - accuracy: 0.9662 - val_loss: 0.1213 - val_accuracy: 0.9590 - 704ms/epoch - 2ms/step
Epoch 311/500
330/330 - 1s - loss: 0.1036 - accuracy: 0.9637 - val_loss: 0.1310 - val_accuracy: 0.9613 - 697ms/epoch - 2ms/step
Epoch 312/500
330/330 - 1s - loss: 0.1032 - accuracy: 0.9643 - val_loss: 0.1363 - val_accuracy: 0.9499 - 789ms/epoch - 2ms/step
Epoch 313/500
330/330 - 1s - loss: 0.1015 - accuracy: 0.9659 - val_loss: 0.1471 - val_accuracy: 0.9476 - 723ms/epoch - 2ms/step
Epoch 314/500
330/330 - 1s - loss: 0.1045 - accuracy: 0.9647 - val_loss: 0.1275 - val_accuracy: 0.9595 - 935ms/epoch - 3ms/step
Epoch 315/500
330/330 - 1s - loss: 0.1010 - accuracy: 0.9655 - val_loss: 0.1165 - val_accuracy: 0.9608 - 783ms/epoch - 2ms/step
Epoch 316/500
330/330 - 1s - loss: 0.0993 - accuracy: 0.9651 - val_loss: 0.1444 - val_accuracy: 0.9440 - 701ms/epoch - 2ms/step
Epoch 317/500
330/330 - 1s - loss: 0.1009 - accuracy: 0.9667 - val_loss: 0.1140 - val_accuracy: 0.9622 - 602ms/epoch - 2ms/step
Epoch 318/500
330/330 - 1s - loss: 0.1005 - accuracy: 0.9669 - val_loss: 0.1359 - val_accuracy: 0.9531 - 566ms/epoch - 2ms/step
Epoch 319/500
330/330 - 1s - loss: 0.1000 - accuracy: 0.9663 - val_loss: 0.1157 - val_accuracy: 0.9608 - 740ms/epoch - 2ms/step
Epoch 320/500
330/330 - 1s - loss: 0.0990 - accuracy: 0.9670 - val_loss: 0.1418 - val_accuracy: 0.9540 - 676ms/epoch - 2ms/step
Epoch 321/500
330/330 - 1s - loss: 0.0999 - accuracy: 0.9661 - val_loss: 0.1355 - val_accuracy: 0.9567 - 728ms/epoch - 2ms/step
Epoch 322/500
330/330 - 1s - loss: 0.1020 - accuracy: 0.9654 - val_loss: 0.1263 - val_accuracy: 0.9567 - 588ms/epoch - 2ms/step
Epoch 323/500
330/330 - 1s - loss: 0.1012 - accuracy: 0.9664 - val_loss: 0.1214 - val_accuracy: 0.9617 - 588ms/epoch - 2ms/step
Epoch 324/500
330/330 - 1s - loss: 0.0995 - accuracy: 0.9664 - val_loss: 0.1224 - val_accuracy: 0.9608 - 592ms/epoch - 2ms/step
Epoch 325/500
330/330 - 1s - loss: 0.0986 - accuracy: 0.9667 - val_loss: 0.1204 - val_accuracy: 0.9595 - 890ms/epoch - 3ms/step
Epoch 326/500
330/330 - 1s - loss: 0.1031 - accuracy: 0.9654 - val_loss: 0.1208 - val_accuracy: 0.9590 - 580ms/epoch - 2ms/step
Epoch 327/500
330/330 - 1s - loss: 0.1010 - accuracy: 0.9640 - val_loss: 0.1291 - val_accuracy: 0.9581 - 642ms/epoch - 2ms/step
Epoch 328/500
330/330 - 1s - loss: 0.1040 - accuracy: 0.9656 - val_loss: 0.1281 - val_accuracy: 0.9558 - 652ms/epoch - 2ms/step
Epoch 329/500
330/330 - 1s - loss: 0.0997 - accuracy: 0.9668 - val_loss: 0.1290 - val_accuracy: 0.9585 - 579ms/epoch - 2ms/step
Epoch 330/500
330/330 - 1s - loss: 0.0988 - accuracy: 0.9679 - val_loss: 0.1230 - val_accuracy: 0.9572 - 676ms/epoch - 2ms/step
Epoch 331/500
330/330 - 1s - loss: 0.0985 - accuracy: 0.9675 - val_loss: 0.1233 - val_accuracy: 0.9567 - 652ms/epoch - 2ms/step
Epoch 332/500
330/330 - 1s - loss: 0.1005 - accuracy: 0.9677 - val_loss: 0.1220 - val_accuracy: 0.9590 - 736ms/epoch - 2ms/step
Epoch 333/500
330/330 - 1s - loss: 0.1010 - accuracy: 0.9660 - val_loss: 0.1447 - val_accuracy: 0.9472 - 723ms/epoch - 2ms/step
Epoch 334/500
330/330 - 1s - loss: 0.1008 - accuracy: 0.9647 - val_loss: 0.1190 - val_accuracy: 0.9608 - 606ms/epoch - 2ms/step
Epoch 335/500
330/330 - 1s - loss: 0.0975 - accuracy: 0.9676 - val_loss: 0.1221 - val_accuracy: 0.9599 - 605ms/epoch - 2ms/step
Epoch 336/500
330/330 - 1s - loss: 0.1038 - accuracy: 0.9657 - val_loss: 0.1179 - val_accuracy: 0.9581 - 540ms/epoch - 2ms/step
Epoch 337/500
330/330 - 1s - loss: 0.0969 - accuracy: 0.9672 - val_loss: 0.1279 - val_accuracy: 0.9572 - 679ms/epoch - 2ms/step
Epoch 338/500
330/330 - 1s - loss: 0.1001 - accuracy: 0.9660 - val_loss: 0.1206 - val_accuracy: 0.9590 - 593ms/epoch - 2ms/step
Epoch 339/500
330/330 - 1s - loss: 0.1043 - accuracy: 0.9638 - val_loss: 0.1750 - val_accuracy: 0.9339 - 564ms/epoch - 2ms/step
Epoch 340/500
330/330 - 1s - loss: 0.1005 - accuracy: 0.9672 - val_loss: 0.1490 - val_accuracy: 0.9499 - 605ms/epoch - 2ms/step
Epoch 341/500
330/330 - 1s - loss: 0.0986 - accuracy: 0.9676 - val_loss: 0.1577 - val_accuracy: 0.9426 - 681ms/epoch - 2ms/step
Epoch 342/500
330/330 - 1s - loss: 0.1043 - accuracy: 0.9634 - val_loss: 0.1403 - val_accuracy: 0.9526 - 606ms/epoch - 2ms/step
Epoch 343/500
330/330 - 1s - loss: 0.0993 - accuracy: 0.9672 - val_loss: 0.1299 - val_accuracy: 0.9563 - 641ms/epoch - 2ms/step
Epoch 344/500
330/330 - 1s - loss: 0.0984 - accuracy: 0.9669 - val_loss: 0.1178 - val_accuracy: 0.9595 - 685ms/epoch - 2ms/step
Epoch 345/500
330/330 - 1s - loss: 0.0972 - accuracy: 0.9676 - val_loss: 0.1195 - val_accuracy: 0.9613 - 760ms/epoch - 2ms/step
Epoch 346/500
330/330 - 1s - loss: 0.0998 - accuracy: 0.9676 - val_loss: 0.1204 - val_accuracy: 0.9608 - 711ms/epoch - 2ms/step
Epoch 347/500
330/330 - 1s - loss: 0.0972 - accuracy: 0.9656 - val_loss: 0.1215 - val_accuracy: 0.9595 - 752ms/epoch - 2ms/step
Epoch 348/500
330/330 - 1s - loss: 0.0952 - accuracy: 0.9684 - val_loss: 0.1115 - val_accuracy: 0.9645 - 657ms/epoch - 2ms/step
Epoch 349/500
330/330 - 1s - loss: 0.0995 - accuracy: 0.9673 - val_loss: 0.1141 - val_accuracy: 0.9640 - 636ms/epoch - 2ms/step
Epoch 350/500
330/330 - 1s - loss: 0.0967 - accuracy: 0.9665 - val_loss: 0.1200 - val_accuracy: 0.9599 - 694ms/epoch - 2ms/step
Epoch 351/500
330/330 - 1s - loss: 0.1001 - accuracy: 0.9662 - val_loss: 0.1268 - val_accuracy: 0.9576 - 662ms/epoch - 2ms/step
Epoch 352/500
330/330 - 1s - loss: 0.0986 - accuracy: 0.9671 - val_loss: 0.1257 - val_accuracy: 0.9590 - 603ms/epoch - 2ms/step
Epoch 353/500
330/330 - 1s - loss: 0.0969 - accuracy: 0.9669 - val_loss: 0.1311 - val_accuracy: 0.9590 - 684ms/epoch - 2ms/step
Epoch 354/500
330/330 - 1s - loss: 0.0957 - accuracy: 0.9676 - val_loss: 0.1166 - val_accuracy: 0.9608 - 705ms/epoch - 2ms/step
Epoch 355/500
330/330 - 1s - loss: 0.1026 - accuracy: 0.9650 - val_loss: 0.1320 - val_accuracy: 0.9590 - 678ms/epoch - 2ms/step
Epoch 356/500
330/330 - 1s - loss: 0.0960 - accuracy: 0.9684 - val_loss: 0.1221 - val_accuracy: 0.9622 - 606ms/epoch - 2ms/step
Epoch 357/500
330/330 - 1s - loss: 0.0928 - accuracy: 0.9698 - val_loss: 0.1269 - val_accuracy: 0.9558 - 586ms/epoch - 2ms/step
Epoch 358/500
330/330 - 1s - loss: 0.0966 - accuracy: 0.9682 - val_loss: 0.1227 - val_accuracy: 0.9572 - 565ms/epoch - 2ms/step
Epoch 359/500
330/330 - 1s - loss: 0.0959 - accuracy: 0.9683 - val_loss: 0.1229 - val_accuracy: 0.9608 - 692ms/epoch - 2ms/step
Epoch 360/500
330/330 - 1s - loss: 0.1057 - accuracy: 0.9639 - val_loss: 0.1579 - val_accuracy: 0.9449 - 718ms/epoch - 2ms/step
Epoch 361/500
330/330 - 1s - loss: 0.1009 - accuracy: 0.9643 - val_loss: 0.1220 - val_accuracy: 0.9585 - 666ms/epoch - 2ms/step
Epoch 362/500
330/330 - 1s - loss: 0.0957 - accuracy: 0.9680 - val_loss: 0.1296 - val_accuracy: 0.9549 - 755ms/epoch - 2ms/step
Epoch 363/500
330/330 - 1s - loss: 0.0981 - accuracy: 0.9670 - val_loss: 0.1345 - val_accuracy: 0.9558 - 656ms/epoch - 2ms/step
Epoch 364/500
330/330 - 1s - loss: 0.0954 - accuracy: 0.9668 - val_loss: 0.1316 - val_accuracy: 0.9604 - 624ms/epoch - 2ms/step
Epoch 365/500
330/330 - 1s - loss: 0.0970 - accuracy: 0.9677 - val_loss: 0.1359 - val_accuracy: 0.9595 - 635ms/epoch - 2ms/step
Epoch 366/500
330/330 - 1s - loss: 0.1016 - accuracy: 0.9659 - val_loss: 0.1298 - val_accuracy: 0.9558 - 679ms/epoch - 2ms/step
Epoch 367/500
330/330 - 1s - loss: 0.0963 - accuracy: 0.9676 - val_loss: 0.1330 - val_accuracy: 0.9544 - 602ms/epoch - 2ms/step
Epoch 368/500
330/330 - 1s - loss: 0.0949 - accuracy: 0.9682 - val_loss: 0.1362 - val_accuracy: 0.9544 - 548ms/epoch - 2ms/step
Epoch 369/500
330/330 - 1s - loss: 0.0941 - accuracy: 0.9670 - val_loss: 0.1128 - val_accuracy: 0.9622 - 639ms/epoch - 2ms/step
Epoch 370/500
330/330 - 1s - loss: 0.0978 - accuracy: 0.9675 - val_loss: 0.1481 - val_accuracy: 0.9531 - 589ms/epoch - 2ms/step
Epoch 371/500
330/330 - 1s - loss: 0.0956 - accuracy: 0.9675 - val_loss: 0.1336 - val_accuracy: 0.9576 - 583ms/epoch - 2ms/step
Epoch 372/500
330/330 - 1s - loss: 0.0943 - accuracy: 0.9692 - val_loss: 0.1299 - val_accuracy: 0.9590 - 721ms/epoch - 2ms/step
Epoch 373/500
330/330 - 1s - loss: 0.0964 - accuracy: 0.9697 - val_loss: 0.1474 - val_accuracy: 0.9485 - 669ms/epoch - 2ms/step
Epoch 374/500
330/330 - 1s - loss: 0.0955 - accuracy: 0.9677 - val_loss: 0.1350 - val_accuracy: 0.9554 - 557ms/epoch - 2ms/step
Epoch 375/500
330/330 - 1s - loss: 0.0952 - accuracy: 0.9685 - val_loss: 0.1160 - val_accuracy: 0.9622 - 593ms/epoch - 2ms/step
Epoch 376/500
330/330 - 1s - loss: 0.0948 - accuracy: 0.9698 - val_loss: 0.1319 - val_accuracy: 0.9554 - 623ms/epoch - 2ms/step
Epoch 377/500
330/330 - 1s - loss: 0.0970 - accuracy: 0.9696 - val_loss: 0.1250 - val_accuracy: 0.9576 - 589ms/epoch - 2ms/step
Epoch 378/500
330/330 - 1s - loss: 0.0938 - accuracy: 0.9713 - val_loss: 0.1184 - val_accuracy: 0.9613 - 671ms/epoch - 2ms/step
Epoch 379/500
330/330 - 1s - loss: 0.0960 - accuracy: 0.9669 - val_loss: 0.1395 - val_accuracy: 0.9549 - 673ms/epoch - 2ms/step
Epoch 380/500
330/330 - 1s - loss: 0.0963 - accuracy: 0.9682 - val_loss: 0.1273 - val_accuracy: 0.9576 - 698ms/epoch - 2ms/step
Epoch 381/500
330/330 - 1s - loss: 0.0940 - accuracy: 0.9685 - val_loss: 0.1263 - val_accuracy: 0.9604 - 635ms/epoch - 2ms/step
Epoch 382/500
330/330 - 1s - loss: 0.0931 - accuracy: 0.9699 - val_loss: 0.1230 - val_accuracy: 0.9581 - 747ms/epoch - 2ms/step
Epoch 383/500
330/330 - 1s - loss: 0.0954 - accuracy: 0.9678 - val_loss: 0.1440 - val_accuracy: 0.9472 - 657ms/epoch - 2ms/step
Epoch 384/500
330/330 - 1s - loss: 0.0977 - accuracy: 0.9671 - val_loss: 0.1126 - val_accuracy: 0.9608 - 653ms/epoch - 2ms/step
Epoch 385/500
330/330 - 1s - loss: 0.0959 - accuracy: 0.9682 - val_loss: 0.1347 - val_accuracy: 0.9581 - 573ms/epoch - 2ms/step
Epoch 386/500
330/330 - 1s - loss: 0.0993 - accuracy: 0.9659 - val_loss: 0.1280 - val_accuracy: 0.9617 - 606ms/epoch - 2ms/step
Epoch 387/500
330/330 - 1s - loss: 0.0927 - accuracy: 0.9694 - val_loss: 0.1320 - val_accuracy: 0.9549 - 718ms/epoch - 2ms/step
Epoch 388/500
330/330 - 1s - loss: 0.0907 - accuracy: 0.9709 - val_loss: 0.1338 - val_accuracy: 0.9567 - 632ms/epoch - 2ms/step
Epoch 389/500
330/330 - 1s - loss: 0.0933 - accuracy: 0.9699 - val_loss: 0.1232 - val_accuracy: 0.9590 - 643ms/epoch - 2ms/step
Epoch 390/500
330/330 - 1s - loss: 0.0949 - accuracy: 0.9685 - val_loss: 0.1315 - val_accuracy: 0.9581 - 664ms/epoch - 2ms/step
Epoch 391/500
330/330 - 1s - loss: 0.0942 - accuracy: 0.9693 - val_loss: 0.1211 - val_accuracy: 0.9622 - 595ms/epoch - 2ms/step
Epoch 392/500
330/330 - 1s - loss: 0.0898 - accuracy: 0.9687 - val_loss: 0.1299 - val_accuracy: 0.9590 - 654ms/epoch - 2ms/step
Epoch 393/500
330/330 - 1s - loss: 0.0908 - accuracy: 0.9707 - val_loss: 0.1356 - val_accuracy: 0.9544 - 681ms/epoch - 2ms/step
Epoch 394/500
330/330 - 1s - loss: 0.0909 - accuracy: 0.9694 - val_loss: 0.1241 - val_accuracy: 0.9563 - 661ms/epoch - 2ms/step
Epoch 395/500
330/330 - 1s - loss: 0.0982 - accuracy: 0.9680 - val_loss: 0.1208 - val_accuracy: 0.9604 - 687ms/epoch - 2ms/step
Epoch 396/500
330/330 - 1s - loss: 0.0934 - accuracy: 0.9696 - val_loss: 0.1338 - val_accuracy: 0.9590 - 612ms/epoch - 2ms/step
Epoch 397/500
330/330 - 1s - loss: 0.0989 - accuracy: 0.9649 - val_loss: 0.1358 - val_accuracy: 0.9531 - 566ms/epoch - 2ms/step
Epoch 398/500
330/330 - 1s - loss: 0.0929 - accuracy: 0.9689 - val_loss: 0.1350 - val_accuracy: 0.9554 - 598ms/epoch - 2ms/step
Epoch 399/500
330/330 - 1s - loss: 0.1033 - accuracy: 0.9650 - val_loss: 0.1210 - val_accuracy: 0.9617 - 639ms/epoch - 2ms/step
Epoch 400/500
330/330 - 1s - loss: 0.0898 - accuracy: 0.9694 - val_loss: 0.1798 - val_accuracy: 0.9408 - 628ms/epoch - 2ms/step
Epoch 401/500
330/330 - 1s - loss: 0.0907 - accuracy: 0.9692 - val_loss: 0.1168 - val_accuracy: 0.9613 - 712ms/epoch - 2ms/step
Epoch 402/500
330/330 - 1s - loss: 0.0928 - accuracy: 0.9678 - val_loss: 0.1207 - val_accuracy: 0.9599 - 682ms/epoch - 2ms/step
Epoch 403/500
330/330 - 1s - loss: 0.0901 - accuracy: 0.9701 - val_loss: 0.1176 - val_accuracy: 0.9608 - 516ms/epoch - 2ms/step
Epoch 404/500
330/330 - 1s - loss: 0.0932 - accuracy: 0.9690 - val_loss: 0.1326 - val_accuracy: 0.9540 - 657ms/epoch - 2ms/step
Epoch 405/500
330/330 - 1s - loss: 0.0893 - accuracy: 0.9704 - val_loss: 0.1278 - val_accuracy: 0.9599 - 547ms/epoch - 2ms/step
Epoch 406/500
330/330 - 1s - loss: 0.0980 - accuracy: 0.9665 - val_loss: 0.1224 - val_accuracy: 0.9604 - 569ms/epoch - 2ms/step
Epoch 407/500
330/330 - 1s - loss: 0.0897 - accuracy: 0.9705 - val_loss: 0.1179 - val_accuracy: 0.9617 - 616ms/epoch - 2ms/step
Epoch 408/500
330/330 - 1s - loss: 0.0913 - accuracy: 0.9699 - val_loss: 0.1189 - val_accuracy: 0.9622 - 681ms/epoch - 2ms/step
Epoch 409/500
330/330 - 1s - loss: 0.0905 - accuracy: 0.9699 - val_loss: 0.1214 - val_accuracy: 0.9613 - 672ms/epoch - 2ms/step
Epoch 410/500
330/330 - 1s - loss: 0.0893 - accuracy: 0.9701 - val_loss: 0.1579 - val_accuracy: 0.9417 - 667ms/epoch - 2ms/step
Epoch 411/500
330/330 - 1s - loss: 0.0915 - accuracy: 0.9699 - val_loss: 0.1403 - val_accuracy: 0.9563 - 637ms/epoch - 2ms/step
Epoch 412/500
330/330 - 1s - loss: 0.0882 - accuracy: 0.9725 - val_loss: 0.1280 - val_accuracy: 0.9576 - 548ms/epoch - 2ms/step
Epoch 413/500
330/330 - 1s - loss: 0.0923 - accuracy: 0.9700 - val_loss: 0.1280 - val_accuracy: 0.9608 - 534ms/epoch - 2ms/step
Epoch 414/500
330/330 - 1s - loss: 0.0965 - accuracy: 0.9685 - val_loss: 0.1368 - val_accuracy: 0.9554 - 582ms/epoch - 2ms/step
Epoch 415/500
330/330 - 1s - loss: 0.0902 - accuracy: 0.9705 - val_loss: 0.1254 - val_accuracy: 0.9585 - 588ms/epoch - 2ms/step
Epoch 416/500
330/330 - 1s - loss: 0.0908 - accuracy: 0.9695 - val_loss: 0.1113 - val_accuracy: 0.9649 - 682ms/epoch - 2ms/step
Epoch 417/500
330/330 - 1s - loss: 0.0938 - accuracy: 0.9680 - val_loss: 0.1283 - val_accuracy: 0.9581 - 718ms/epoch - 2ms/step
Epoch 418/500
330/330 - 1s - loss: 0.0897 - accuracy: 0.9706 - val_loss: 0.1408 - val_accuracy: 0.9590 - 788ms/epoch - 2ms/step
Epoch 419/500
330/330 - 1s - loss: 0.0883 - accuracy: 0.9713 - val_loss: 0.1266 - val_accuracy: 0.9595 - 713ms/epoch - 2ms/step
Epoch 420/500
330/330 - 1s - loss: 0.0944 - accuracy: 0.9691 - val_loss: 0.1169 - val_accuracy: 0.9617 - 782ms/epoch - 2ms/step
Epoch 421/500
330/330 - 1s - loss: 0.0895 - accuracy: 0.9707 - val_loss: 0.1206 - val_accuracy: 0.9595 - 801ms/epoch - 2ms/step
Epoch 422/500
330/330 - 1s - loss: 0.0928 - accuracy: 0.9686 - val_loss: 0.1161 - val_accuracy: 0.9640 - 740ms/epoch - 2ms/step
Epoch 423/500
330/330 - 1s - loss: 0.0877 - accuracy: 0.9716 - val_loss: 0.1367 - val_accuracy: 0.9599 - 810ms/epoch - 2ms/step
Epoch 424/500
330/330 - 1s - loss: 0.0890 - accuracy: 0.9705 - val_loss: 0.1201 - val_accuracy: 0.9622 - 767ms/epoch - 2ms/step
Epoch 425/500
330/330 - 1s - loss: 0.0967 - accuracy: 0.9681 - val_loss: 0.1194 - val_accuracy: 0.9622 - 711ms/epoch - 2ms/step
Epoch 426/500
330/330 - 1s - loss: 0.0910 - accuracy: 0.9698 - val_loss: 0.1183 - val_accuracy: 0.9604 - 762ms/epoch - 2ms/step
Epoch 427/500
330/330 - 1s - loss: 0.0909 - accuracy: 0.9694 - val_loss: 0.1280 - val_accuracy: 0.9590 - 788ms/epoch - 2ms/step
Epoch 428/500
330/330 - 1s - loss: 0.0936 - accuracy: 0.9695 - val_loss: 0.1175 - val_accuracy: 0.9613 - 674ms/epoch - 2ms/step
Epoch 429/500
330/330 - 1s - loss: 0.0902 - accuracy: 0.9697 - val_loss: 0.1223 - val_accuracy: 0.9599 - 837ms/epoch - 3ms/step
Epoch 430/500
330/330 - 1s - loss: 0.0877 - accuracy: 0.9717 - val_loss: 0.1183 - val_accuracy: 0.9631 - 864ms/epoch - 3ms/step
Epoch 431/500
330/330 - 1s - loss: 0.0923 - accuracy: 0.9695 - val_loss: 0.1311 - val_accuracy: 0.9585 - 793ms/epoch - 2ms/step
Epoch 432/500
330/330 - 1s - loss: 0.0870 - accuracy: 0.9724 - val_loss: 0.1326 - val_accuracy: 0.9581 - 726ms/epoch - 2ms/step
Epoch 433/500
330/330 - 1s - loss: 0.0896 - accuracy: 0.9706 - val_loss: 0.1253 - val_accuracy: 0.9626 - 709ms/epoch - 2ms/step
Epoch 434/500
330/330 - 1s - loss: 0.0920 - accuracy: 0.9691 - val_loss: 0.1336 - val_accuracy: 0.9595 - 660ms/epoch - 2ms/step
Epoch 435/500
330/330 - 1s - loss: 0.0894 - accuracy: 0.9707 - val_loss: 0.1266 - val_accuracy: 0.9636 - 608ms/epoch - 2ms/step
Epoch 436/500
330/330 - 1s - loss: 0.0873 - accuracy: 0.9713 - val_loss: 0.1141 - val_accuracy: 0.9640 - 651ms/epoch - 2ms/step
Epoch 437/500
330/330 - 1s - loss: 0.0931 - accuracy: 0.9692 - val_loss: 0.1136 - val_accuracy: 0.9636 - 633ms/epoch - 2ms/step
Epoch 438/500
330/330 - 1s - loss: 0.0849 - accuracy: 0.9711 - val_loss: 0.1286 - val_accuracy: 0.9599 - 624ms/epoch - 2ms/step
Epoch 439/500
330/330 - 1s - loss: 0.0841 - accuracy: 0.9718 - val_loss: 0.1441 - val_accuracy: 0.9522 - 658ms/epoch - 2ms/step
Epoch 440/500
330/330 - 1s - loss: 0.0905 - accuracy: 0.9708 - val_loss: 0.1299 - val_accuracy: 0.9608 - 718ms/epoch - 2ms/step
Epoch 441/500
330/330 - 1s - loss: 0.0883 - accuracy: 0.9704 - val_loss: 0.1258 - val_accuracy: 0.9613 - 698ms/epoch - 2ms/step
Epoch 442/500
330/330 - 1s - loss: 0.0893 - accuracy: 0.9711 - val_loss: 0.1171 - val_accuracy: 0.9640 - 646ms/epoch - 2ms/step
Epoch 443/500
330/330 - 1s - loss: 0.0825 - accuracy: 0.9729 - val_loss: 0.1343 - val_accuracy: 0.9554 - 698ms/epoch - 2ms/step
Epoch 444/500
330/330 - 1s - loss: 0.0901 - accuracy: 0.9710 - val_loss: 0.1220 - val_accuracy: 0.9640 - 641ms/epoch - 2ms/step
Epoch 445/500
330/330 - 1s - loss: 0.0891 - accuracy: 0.9702 - val_loss: 0.1388 - val_accuracy: 0.9567 - 625ms/epoch - 2ms/step
Epoch 446/500
330/330 - 1s - loss: 0.0866 - accuracy: 0.9713 - val_loss: 0.1248 - val_accuracy: 0.9576 - 638ms/epoch - 2ms/step
Epoch 447/500
330/330 - 1s - loss: 0.0865 - accuracy: 0.9711 - val_loss: 0.1221 - val_accuracy: 0.9590 - 608ms/epoch - 2ms/step
Epoch 448/500
330/330 - 1s - loss: 0.0881 - accuracy: 0.9710 - val_loss: 0.1432 - val_accuracy: 0.9494 - 607ms/epoch - 2ms/step
Epoch 449/500
330/330 - 1s - loss: 0.0895 - accuracy: 0.9711 - val_loss: 0.1237 - val_accuracy: 0.9599 - 669ms/epoch - 2ms/step
Epoch 450/500
330/330 - 1s - loss: 0.0837 - accuracy: 0.9722 - val_loss: 0.1193 - val_accuracy: 0.9613 - 669ms/epoch - 2ms/step
Epoch 451/500
330/330 - 1s - loss: 0.0916 - accuracy: 0.9694 - val_loss: 0.1166 - val_accuracy: 0.9640 - 665ms/epoch - 2ms/step
Epoch 452/500
330/330 - 1s - loss: 0.0874 - accuracy: 0.9715 - val_loss: 0.1454 - val_accuracy: 0.9522 - 570ms/epoch - 2ms/step
Epoch 453/500
330/330 - 1s - loss: 0.0876 - accuracy: 0.9710 - val_loss: 0.1256 - val_accuracy: 0.9581 - 648ms/epoch - 2ms/step
Epoch 454/500
330/330 - 1s - loss: 0.0876 - accuracy: 0.9718 - val_loss: 0.1203 - val_accuracy: 0.9617 - 605ms/epoch - 2ms/step
Epoch 455/500
330/330 - 1s - loss: 0.0865 - accuracy: 0.9720 - val_loss: 0.1365 - val_accuracy: 0.9581 - 588ms/epoch - 2ms/step
Epoch 456/500
330/330 - 1s - loss: 0.0840 - accuracy: 0.9721 - val_loss: 0.1216 - val_accuracy: 0.9622 - 666ms/epoch - 2ms/step
Epoch 457/500
330/330 - 1s - loss: 0.0899 - accuracy: 0.9713 - val_loss: 0.1169 - val_accuracy: 0.9645 - 719ms/epoch - 2ms/step
Epoch 458/500
330/330 - 1s - loss: 0.0867 - accuracy: 0.9728 - val_loss: 0.1187 - val_accuracy: 0.9636 - 655ms/epoch - 2ms/step
Epoch 459/500
330/330 - 1s - loss: 0.0856 - accuracy: 0.9697 - val_loss: 0.1321 - val_accuracy: 0.9576 - 749ms/epoch - 2ms/step
Epoch 460/500
330/330 - 1s - loss: 0.0834 - accuracy: 0.9735 - val_loss: 0.1285 - val_accuracy: 0.9567 - 684ms/epoch - 2ms/step
Epoch 461/500
330/330 - 1s - loss: 0.0898 - accuracy: 0.9699 - val_loss: 0.1192 - val_accuracy: 0.9626 - 576ms/epoch - 2ms/step
Epoch 462/500
330/330 - 1s - loss: 0.0880 - accuracy: 0.9698 - val_loss: 0.1251 - val_accuracy: 0.9608 - 636ms/epoch - 2ms/step
Epoch 463/500
330/330 - 1s - loss: 0.0879 - accuracy: 0.9710 - val_loss: 0.1222 - val_accuracy: 0.9636 - 612ms/epoch - 2ms/step
Epoch 464/500
330/330 - 1s - loss: 0.0830 - accuracy: 0.9716 - val_loss: 0.1203 - val_accuracy: 0.9640 - 562ms/epoch - 2ms/step
Epoch 465/500
330/330 - 1s - loss: 0.0835 - accuracy: 0.9719 - val_loss: 0.1232 - val_accuracy: 0.9599 - 588ms/epoch - 2ms/step
Epoch 466/500
330/330 - 1s - loss: 0.0856 - accuracy: 0.9713 - val_loss: 0.1621 - val_accuracy: 0.9499 - 580ms/epoch - 2ms/step
Epoch 467/500
330/330 - 1s - loss: 0.0917 - accuracy: 0.9710 - val_loss: 0.1234 - val_accuracy: 0.9626 - 636ms/epoch - 2ms/step
Epoch 468/500
330/330 - 1s - loss: 0.0856 - accuracy: 0.9707 - val_loss: 0.1337 - val_accuracy: 0.9595 - 658ms/epoch - 2ms/step
Epoch 469/500
330/330 - 1s - loss: 0.0859 - accuracy: 0.9724 - val_loss: 0.1261 - val_accuracy: 0.9590 - 688ms/epoch - 2ms/step
Epoch 470/500
330/330 - 1s - loss: 0.0842 - accuracy: 0.9717 - val_loss: 0.1321 - val_accuracy: 0.9581 - 741ms/epoch - 2ms/step
Epoch 471/500
330/330 - 1s - loss: 0.0827 - accuracy: 0.9728 - val_loss: 0.1340 - val_accuracy: 0.9572 - 599ms/epoch - 2ms/step
Epoch 472/500
330/330 - 1s - loss: 0.0898 - accuracy: 0.9687 - val_loss: 0.1219 - val_accuracy: 0.9636 - 574ms/epoch - 2ms/step
Epoch 473/500
330/330 - 1s - loss: 0.0861 - accuracy: 0.9718 - val_loss: 0.1322 - val_accuracy: 0.9572 - 649ms/epoch - 2ms/step
Epoch 474/500
330/330 - 1s - loss: 0.0879 - accuracy: 0.9710 - val_loss: 0.1177 - val_accuracy: 0.9617 - 643ms/epoch - 2ms/step
Epoch 475/500
330/330 - 1s - loss: 0.0827 - accuracy: 0.9717 - val_loss: 0.1303 - val_accuracy: 0.9604 - 707ms/epoch - 2ms/step
Epoch 476/500
330/330 - 1s - loss: 0.0901 - accuracy: 0.9705 - val_loss: 0.1219 - val_accuracy: 0.9622 - 707ms/epoch - 2ms/step
Epoch 477/500
330/330 - 1s - loss: 0.0860 - accuracy: 0.9708 - val_loss: 0.1235 - val_accuracy: 0.9622 - 600ms/epoch - 2ms/step
Epoch 478/500
330/330 - 1s - loss: 0.0849 - accuracy: 0.9723 - val_loss: 0.1206 - val_accuracy: 0.9640 - 590ms/epoch - 2ms/step
Epoch 479/500
330/330 - 1s - loss: 0.0853 - accuracy: 0.9712 - val_loss: 0.1255 - val_accuracy: 0.9590 - 639ms/epoch - 2ms/step
Epoch 480/500
330/330 - 1s - loss: 0.0883 - accuracy: 0.9694 - val_loss: 0.1244 - val_accuracy: 0.9631 - 718ms/epoch - 2ms/step
Epoch 481/500
330/330 - 1s - loss: 0.0899 - accuracy: 0.9690 - val_loss: 0.1226 - val_accuracy: 0.9617 - 599ms/epoch - 2ms/step
Epoch 482/500
330/330 - 1s - loss: 0.0870 - accuracy: 0.9704 - val_loss: 0.1269 - val_accuracy: 0.9599 - 606ms/epoch - 2ms/step
Epoch 483/500
330/330 - 1s - loss: 0.0864 - accuracy: 0.9712 - val_loss: 0.1248 - val_accuracy: 0.9599 - 714ms/epoch - 2ms/step
Epoch 484/500
330/330 - 1s - loss: 0.0829 - accuracy: 0.9722 - val_loss: 0.1453 - val_accuracy: 0.9563 - 668ms/epoch - 2ms/step
Epoch 485/500
330/330 - 1s - loss: 0.0861 - accuracy: 0.9728 - val_loss: 0.1146 - val_accuracy: 0.9649 - 698ms/epoch - 2ms/step
Epoch 486/500
330/330 - 1s - loss: 0.0823 - accuracy: 0.9734 - val_loss: 0.1188 - val_accuracy: 0.9631 - 732ms/epoch - 2ms/step
Epoch 487/500
330/330 - 1s - loss: 0.0877 - accuracy: 0.9718 - val_loss: 0.1276 - val_accuracy: 0.9608 - 652ms/epoch - 2ms/step
Epoch 488/500
330/330 - 1s - loss: 0.0853 - accuracy: 0.9715 - val_loss: 0.1255 - val_accuracy: 0.9604 - 651ms/epoch - 2ms/step
Epoch 489/500
330/330 - 1s - loss: 0.0862 - accuracy: 0.9722 - val_loss: 0.1222 - val_accuracy: 0.9636 - 604ms/epoch - 2ms/step
Epoch 490/500
330/330 - 1s - loss: 0.0826 - accuracy: 0.9718 - val_loss: 0.1371 - val_accuracy: 0.9549 - 651ms/epoch - 2ms/step
Epoch 491/500
330/330 - 1s - loss: 0.0908 - accuracy: 0.9689 - val_loss: 0.1253 - val_accuracy: 0.9595 - 581ms/epoch - 2ms/step
Epoch 492/500
330/330 - 1s - loss: 0.0832 - accuracy: 0.9717 - val_loss: 0.1515 - val_accuracy: 0.9563 - 619ms/epoch - 2ms/step
Epoch 493/500
330/330 - 1s - loss: 0.0887 - accuracy: 0.9695 - val_loss: 0.1185 - val_accuracy: 0.9640 - 560ms/epoch - 2ms/step
Epoch 494/500
330/330 - 1s - loss: 0.0834 - accuracy: 0.9709 - val_loss: 0.1233 - val_accuracy: 0.9645 - 550ms/epoch - 2ms/step
Epoch 495/500
330/330 - 1s - loss: 0.0839 - accuracy: 0.9724 - val_loss: 0.1234 - val_accuracy: 0.9613 - 546ms/epoch - 2ms/step
Epoch 496/500
330/330 - 1s - loss: 0.0814 - accuracy: 0.9731 - val_loss: 0.1086 - val_accuracy: 0.9677 - 654ms/epoch - 2ms/step
Epoch 497/500
330/330 - 1s - loss: 0.0834 - accuracy: 0.9713 - val_loss: 0.1285 - val_accuracy: 0.9585 - 733ms/epoch - 2ms/step
Epoch 498/500
330/330 - 1s - loss: 0.0920 - accuracy: 0.9679 - val_loss: 0.1246 - val_accuracy: 0.9613 - 626ms/epoch - 2ms/step
Epoch 499/500
330/330 - 1s - loss: 0.0894 - accuracy: 0.9710 - val_loss: 0.1196 - val_accuracy: 0.9622 - 614ms/epoch - 2ms/step
Epoch 500/500
330/330 - 1s - loss: 0.0826 - accuracy: 0.9723 - val_loss: 0.1170 - val_accuracy: 0.9649 - 637ms/epoch - 2ms/step
Show the code
#-----------------------------------------------------------------------------
# Save Files for Dashboard ---------------------------------------------------
#-----------------------------------------------------------------------------

saveRDS(train_data_encoded, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/train_data_encoded.rds")
saveRDS(validation_data_encoded, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/validation_data_encoded.rds")
saveRDS(test_data_encoded, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/test_data_encoded.rds")

5.2 Assessing Training Set Accuracy

Let’s see how well the model fits the data it was trained on. This is to establish a baseline against which to compare the accuracy on the test set, to determine the extent of overfitting.

Show the code
#-----------------------------------------------------------------------------
# Evaluate Accuracy on Training Set ------------------------------------------
#-----------------------------------------------------------------------------

# Evaluate the saved best model
best_mlp_model <- load_model_hdf5("best_model.h5")  # Load the best model

# Use the saved best model to evaluate accuracy on the test dataset

training_inputs <- as.matrix(train_data_encoded[, predictors])
training_labels <- as.matrix(as.numeric(train_data[["left_numeric"]]))

train_eval_results_nn <- best_mlp_model %>% evaluate(training_inputs, training_labels, batch_size = 32)
330/330 - 1s - loss: 0.0705 - accuracy: 0.9776 - 549ms/epoch - 2ms/step
Show the code
predictions <- best_mlp_model %>% predict(training_inputs)
330/330 - 0s - 433ms/epoch - 1ms/step
Show the code
train_accuracy_nn <- train_eval_results_nn[["accuracy"]]*100
  • Accuracy on training set: 97.7609%

5.3 Assessing Test Set Accuracy

Now that we have the accuracy on the training set as the baseline, let’s see how well the model does on new data, to get a sense of how well it generalizes.

Show the code
#-----------------------------------------------------------------------------
# Evaluate Accuracy on Test Set ----------------------------------------------
#-----------------------------------------------------------------------------

# Use the saved best model to evaluate accuracy on the test dataset

test_inputs <- as.matrix(test_data_encoded[, predictors])
test_labels <- as.matrix(as.numeric(test_data[["left_numeric"]]))

test_eval_results_nn <- best_mlp_model %>% evaluate(test_inputs, test_labels, batch_size = 32)
71/71 - 0s - loss: 0.1108 - accuracy: 0.9655 - 92ms/epoch - 1ms/step
Show the code
predictions <- best_mlp_model %>% predict(test_inputs)
71/71 - 0s - 76ms/epoch - 1ms/step
Show the code
test_accuracy_nn <- test_eval_results_nn[["accuracy"]]*100
  • Accuracy on test set: 96.5548%
  • Pretty good! There doesn’t seem to be too much overfitting.

5.4 Testing Precision, Recall, and Specificity

Show the code
predicted_classes <- ifelse(predictions > 0.5, 1, 0)

confusion <- confusionMatrix(
  factor(predicted_classes),
  factor(test_labels),
  positive = "1"
)

precision_nn <- confusion$byClass["Precision"]
recall_nn <- confusion$byClass["Recall"]
specificity_nn <- confusion$byClass["Specificity"]

# Calculate F1 Score
f1_score_nn <- confusion$byClass["F1"]

# Print the results
cat("Precision:", precision_nn, "\n")
Precision: 0.9274 
Show the code
cat("Recall (Sensitivity):", recall_nn, "\n")
Recall (Sensitivity): 0.9274 
Show the code
cat("F1 Score:", f1_score_nn, "\n")
F1 Score: 0.9274 
Show the code
cat("Specificity:", specificity_nn, "\n")
Specificity: 0.9774 

For the neural network model, we got good accuracy and F1 scores, even without doing any tuning. It’s likely that we could get the model to be even more accurate by tuning the parameters - however, for this demo, we’ll stick with this initial model.

Pros of Neural Network: Compared to the logistic regression, we got 11.7% better accuracy than the logistic regression. While the logistic regression model achieve approximately 85% accuracy (15% error rate), the neural network reduced the error rate to just 3%, cutting errors by more than half. In practical terms, if we were to deploy both models in the real world, we would expect the machine learning model to predict turnover correctly for approximately 12 more people out of every 100. We also didn’t have to categorize our continuous predictors with the neural network approach, as it can handle non-linearity.

Cons: While the neural netowkr is better at making predictions than the logistic regression, it’s less easily interpretable. It doesn’t provide us with a table with the coefficient estimates for each predictor and a p-value. To make up for this, we can run experiments with the model to see how the predictions would change if we made different changes to the predictors (e.g., if satisfaction scores changed by 5%), but the results still won’t be as clear-cut as the output from a logistic regression. We also don’t see how the hidden layers operate. Finally, if we trained the model using a different seed, we would get different results, meaning that repeatability could be a problem.


6 Random Forest

One approach that offers more relatively more interpretability than a neural network (but still less than a logistic regression) is a Random Forest. We explore this approach in this section. While not as transparent as regression models, they provide insights into the relative importance of predictors, helping explain why some variables might contribute more to the model’s predictions. Like a neural network, it can also capture non-linear relationships and offer an advantage in terms of predictive accuracy. Random Forests are a ensemble learning method that combine many decision trees to predict an outcome.

A decision tree is a simple machine learning model that makes predictions by recursively splitting data into subsets based on the values of input features. It’s made up of:

  • Root node: The starting point of the tree, containing the whole dataset.

  • Internal nodes: Points where the data is split based on a feature

  • Leaf nodes: Endpoints that represent the final prediction for a subset of data.

The tree branches from the root to the leaf nodes.

The root node starts with all the data, and then chooses the feature and split that best separates the target variable (e.g., satisfaction_level > 0.5). At each node, the tree repeats the process for the subsets of data, splitting again based on the feature and condition that best separates the target. The tree stops splitting (leaf nodes) when a stopping criteria is met, such as:

  • A node has a minimum number of samples

  • The maximum depth is reached

  • The data in the node is pure (all samples belong to one class).

For classification, each leaf node assigns a class label (e.g., left/stayed). For regression, each leaf node predicts the mean value of the target variable for the data in that node.

At each node, the tree splits the data into 2 or more subsets based on a decision rule. The decision rule is based on one feature and one condition (e.g., satisfaction_level > 0.5). The algorithm evaluates all features and potential split points to choose the one that optimizes the purity of the resulting nodes.

The tree chooses splits to maximize the “purity” of resulting nodes.

Purity metrics (for classification):

  • Gini impurity: measures how mixed the classes are in a node (smaller is better)

  • Entropy: measures the randomness or disorder of the data

Variance reduction (for regression): measures how much the variability in the outcome is reduced.

  • For example, maybe the first split in the root node is based on satisfaction_level > 0.5. Employees with high satisfaction go to one branch, others to the other. For employees with high satisfaction, the next split is based on time_spend_company > 3. This splits highly satisfied employees into short-tenured and long-tenured groups. For employees with satisfaction level < 0.5 and time_spend_company > 3, the tree predicts they are likely to leave in a leaf node (class 1).

How does prediction work? For a new data point, the tree starts at the root node and follows the splits down to a specific leaf node. The prediction for that data point comes from the value in the leaf node it reaches.

A Random Forest builds multiple decision trees and combines their predictions to improve accuracy and robustness. Instead of relying on just one tree, it uses a forest of many decision trees. It’s used for classification and regression tasks, and can offer more interpretability than other machine learning techniques like neural networks.

A Random Forest is essentially a collection (or “forest”) of decision trees. Each tree in the forest makes its own prediction for a data point, and the forest aggregates these predictions to produce the final result. In classification, the forest predicts the majority class from all trees’ votes. In regression, the forest predicts the average of all trees’ predictions.

This is helpful because individual decision trees are prone to overfitting, especially when they grow deep. By combining multiple trees, random forests can help reduce overfitting.

Two main features can help ensure that each tree in the forest is different, which reduces correlation between trees:

  • (1) Bagging: Each tree is trained on a random sample (with replacement) of the training data. This means some data points are used multiple times in one tree while others are left out (called out-of-bag samples).

  • (2) Feature Randomization: At each node, only a random subset of features is considered for splitting. This forces the trees to learn different patterns, even if they are trained on similar data.

Since each tree is trained on a bootstrap sample, the data points not included (OOB samples) can be used to estimate the model’s error, without needing a separate validation set.

How it works:

  • Data Sampling: Take multiple bootstrap samples from the training data. Train one tree on each sample.

  • Tree Building: Grow each decision tree to its maximum depth. At each split, use a random subset of features to determine the best split.

  • Aggregation: For classification, use the majority voting across all trees to figure out the outcome (left/stayed). For regression, use the average prediction from all the trees.

6.1 Fitting the Random Forest

Show the code
pacman::p_load(randomForest)

# We have to set the number of predictors to randomly choose from at each split. Often, the default value for classification is  the square root of the total number of predictors.

k <- round(sqrt(7))

train_validation_data_rf <- train_validation_data %>%
  select(-contains("_cat"), -left_numeric)

# Step 1: Train the Random Forest model
set.seed(123) # For reproducibility
rf_model <- randomForest(
  left ~ satisfaction_level + last_evaluation + average_monthly_hours +
    number_project + time_spend_company + salary + department,
  data = train_validation_data_rf,
  ntree = 500, # Number of trees
  mtry = k,    # Number of predictors randomly sampled at each split (tuning parameter)
  importance = TRUE # To assess variable importance
)

# Step 2: Evaluate the model
# View the model summary
print(rf_model)

Call:
 randomForest(formula = left ~ satisfaction_level + last_evaluation +      average_monthly_hours + number_project + time_spend_company +      salary + department, data = train_validation_data_rf, ntree = 500,      mtry = k, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 0.75%
Confusion matrix:
     0    1 class.error
0 9686   15    0.001546
1   81 2953    0.026697
Show the code
# We can see that this is a classification model. The model used 500 trees, and chose 3 random variables at each split in the trees. The out-of-bag (OOB) error rate was 0.83%, which means that the model misclassified around 0.83% of instances during training (this is very low, indicating strong performance on the training data). The confusion matrix tells us that the model correctly classified 8015 instances as "stayed" and misclassified 14 as "stayed" (error rate of 0.17%), and the model correctly classified 2437 as "left" and incorrectly classified 74 as "left" (error rate 2.95%). This means that the model performs well overall, with slightly higher error rate for employees who left (but this can be common in datasets where one class is less common).

#-----------------------------------------------------------------------------
# Save Files for Dashboard ---------------------------------------------------
#-----------------------------------------------------------------------------

saveRDS(test_data, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/test_data_RF.rds")
saveRDS(train_validation_data_rf, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/train_data_RF.rds")
saveRDS(rf_model, "C:/Users/ClaudieCoulombe/OneDrive - System 3/General P/Employee Turnover/rf_model.rds")

6.2 Assessing Accuracy on Training Set

Show the code
# Generate predictions
test_preds <- predict(rf_model, newdata = train_validation_data_rf)

# Confusion matrix
confusion_matrix <- table(train_validation_data_rf$left, test_preds)
print(confusion_matrix)
   test_preds
       0    1
  0 9700    1
  1    0 3034
Show the code
# Calculate accuracy
train_accuracy_rf <- (sum(diag(confusion_matrix)) / sum(confusion_matrix))*100
  • Accuracy on training set: 99.9921

6.3 Assessing Accuracy on Test Set

Show the code
# Generate predictions
test_preds <- predict(rf_model, newdata = test_data)

# Confusion matrix
confusion_matrix <- table(test_data$left, test_preds)
print(confusion_matrix)
   test_preds
       0    1
  0 1723    4
  1   14  523
Show the code
# Calculate test accuracy
test_accuracy_rf <- (sum(diag(confusion_matrix)) / sum(confusion_matrix))*100
  • Accuracy on test set: 99.2049

6.4 Testing Precision, Recall, Specificity

Show the code
# Create confusion matrix
confusion <- confusionMatrix(
  factor(test_preds),
  factor(test_data$left),
  positive = "1"
)

# Extract precision and recall
precision_rf <- confusion$byClass["Precision"]
recall_rf <- confusion$byClass["Recall"]
specificity_rf <- confusion$byClass["Specificity"]
f1_score_rf <-  confusion$byClass["F1"]

# Print the results
cat("Precision:", precision_rf, "\n")
Precision: 0.9924 
Show the code
cat("Recall (Sensitivity):", recall_rf, "\n")
Recall (Sensitivity): 0.9739 
Show the code
cat("Specificity:", specificity_rf, "\n")
Specificity: 0.9977 
Show the code
cat("F1 Score:", f1_score_rf, "\n")
F1 Score: 0.9831 

7 Comparing Models

Logistic Regression Multilayer Perceptron Random Forest
Training Set Accuracy 85.24 97.7609 99.9921
Test Set Accuracy 84.89 96.5548 99.2049
Precision 0.7097 0.9274 0.9924
Sensitivity (Recall) 0.6145 0.9274 0.9739
F1 Score 0.6587 0.9274 0.9831

Conclusion:

The Random Forest model demonstrates superior performance across all evaluated metrics, suggesting it is the most effective for predicting employee turnover in this context. However, it’s essential to consider the complexity and interpretability of the model. While Random Forest and Neural Networks offer high accuracy, Logistic Regression provides more straightforward interpretability, which can be valuable for understanding the influence of individual predictors on turnover.


8 Predictor Importance

Show the code
#-----------------------------------------------------------------------------
# Display Table --------------------------------------------------------------
#-----------------------------------------------------------------------------

suppressWarnings({
library(sjPlot)
})

suppressMessages({
  
tab_model(logistic_model, 
          title = "Logistic Regression Model Results",
          show.ci = FALSE,   
          show.se = TRUE,    
          show.stat = TRUE,
          show.reflvl = TRUE,
          prefix.labels = "varname",
          CSS = list(
            css.depvarhead = 'text-align: left;',
            css.centeralign = 'text-align: left;',
            css.summary = 'font-weight: bold;'
          ))
  })
Logistic Regression Model Results
  left_numeric
Predictors Odds Ratios std. Error Statistic p
(Intercept) 0.69 0.15 -1.70 0.090
satisfaction_level_cat:
High
Reference
satisfaction_level_cat:
Low
8.20 0.72 24.13 <0.001
satisfaction_level_cat:
Lower-Mid
0.84 0.08 -1.78 0.074
satisfaction_level_cat:
Upper-Mid
1.15 0.10 1.59 0.111
last_evaluation_cat: High Reference
last_evaluation_cat: Low 1.01 0.08 0.07 0.942
last_evaluation_cat:
Lower-Mid
0.14 0.01 -18.80 <0.001
last_evaluation_cat:
Upper-Mid
0.43 0.03 -10.84 <0.001
average_monthly_hours_cat:
High
Reference
average_monthly_hours_cat:
Low
0.80 0.06 -2.80 0.005
average_monthly_hours_cat:
Lower-Mid
0.15 0.01 -19.24 <0.001
average_monthly_hours_cat:
Upper-Mid
0.37 0.03 -12.43 <0.001
number_project_cat: High Reference
number_project_cat: Low 0.66 0.07 -3.91 <0.001
number_project_cat:
Lower-Mid
0.25 0.03 -11.97 <0.001
number_project_cat:
Upper-Mid
0.57 0.06 -5.12 <0.001
time_spend_company_cat:
High
Reference
time_spend_company_cat:
Low
0.24 0.02 -19.15 <0.001
time_spend_company_cat:
Upper-Mid
0.29 0.03 -13.40 <0.001
salary: high Reference
salary: low 6.27 0.92 12.45 <0.001
salary: medium 3.44 0.51 8.32 <0.001
promotion_last_5years 0.29 0.09 -4.15 <0.001
department: accounting Reference
department: hr 1.24 0.21 1.32 0.188
department: IT 0.81 0.12 -1.37 0.171
department: management 0.67 0.13 -2.02 0.043
department: marketing 0.89 0.15 -0.68 0.494
department: product_mng 0.72 0.12 -2.05 0.040
department: RandD 0.41 0.07 -4.92 <0.001
department: sales 0.93 0.12 -0.61 0.541
department: support 0.95 0.13 -0.36 0.715
department: technical 1.01 0.13 0.07 0.945
work_accident 0.21 0.02 -14.46 <0.001
Observations 12735
R2 Tjur 0.438
Show the code
# Check variable importance on the test set

importance(rf_model) # Numeric importance
                          0      1 MeanDecreaseAccuracy MeanDecreaseGini
satisfaction_level    98.26 335.05               335.13          1661.01
last_evaluation       27.97 180.64               180.83           556.72
average_monthly_hours 83.73 122.66               139.20           642.79
number_project        71.00 220.30               220.08           817.30
time_spend_company    63.71  85.69                93.44           839.60
salary                14.44  49.22                44.73            30.06
department            16.33  92.77                64.18            67.96
Show the code
varImpPlot(rf_model) # Plot variable importance

Show the code
# The mean decrease accuracy tells us how much accuracy decreases when a specific variable is excluded from the model. Higher values mean the variable is more important for accurate predictions. The mean decrease gini tells us how much each variable contributes to reducing impurity (gini index) across all trees. Higher values mean the variable plays a bigger role in splitting nodes and improving classification. Column 0 tells us how important each variable is in predicting staying, while column 1 tells us how important each predictor is in predicting leaving. 

# We can see that satisfaction level is much more important for predicting employees who left than those who stayed. This suggests that low satisfaction is associated with turnover. We can also see that removing satisfaction level would reduce the model's accuracy the most. It also has the highest impact on reducing node impurity. 
  • Both models identify satisfaction level as the most critical predictor of employee turnover.

    • Logistic Regression highlights that employees with low satisfaction levels are 8.2 times more likely to leave compared to employees with high satisfaction levels.

    • RF ranks satisfaction level as the top feature by both Mean Decrease Accuracy and Mean Decrease Gini.

  • Employees with low salaries are significantly more likely to leave.

    • Logistic Regression assigns an Odds Ratio of 6.27 for low salary, and RF includes salary as a moderately important predictor.
  • Workload and Tenure:

    • Both models suggest that balanced workloads and moderate tenure (e.g., average monthly hours, number of projects, and time spent at the company) reduce turnover risk.

    • Logistic Regression quantifies the relationships (e.g., Odds Ratio = 0.15 for lower-mid monthly hours), while RF provides rankings of feature importance.

  • Departments:

    • Logistic Regression identifies R&D and management as departments with lower turnover risks. RF ranks department as a lower-priority feature overall.
  • For actionable insights (e.g., presenting findings to stakeholders, prioritizing retention initiatives) use Logistic Regression to interpret predictor importance and direction.

  • For predictive tasks (e.g., identifying high-risk employees for targeted interventions), rely on Random Forest for its higher accuracy and robust predictions.


9 Limitations

  • Other variables not measured may also have a strong effect on turnover risk (e.g., employee-manager relationships, company culture, external job market conditions).

  • The findings might not generalize to other organizations with different turnover rates, employees, environments.

  • The dataset is cross-sectional, making it difficult to infer causality or temporal relationships (e.g., whether low satisfaction causes turnover or vice versa).

  • Not all predictors (e.g., department, tenure) are actionable or easily changeable, which limits the organization’s ability to act on these insights.